Algorithm for collecting total events within the last certain time

Algorithm for collecting total events within the last certain time - algorithm

I am facing an algorithm problem.
We have a task that runs every 10ms and during the running, an event can happen or not happen. Is there any simple algorithm that allows us to keep track of how many time an event is triggered within the latest, say, 1 second?
The only idea that I have is to implement an array and save all the events. As we are programming embedded systems, there is not enough space...
Thanks in advance.

an array of 13 bytes for a second worth of events in 10ms steps.
consider it an array of 104 bits marking 0ms to 104ms
if the event occurs mark the bit and increment to the next time, else just increment to next bit/byte.
if you want ... run length encode after each second to offload the event bits into another value.
or ... treat it as a circular buffer and keep the count available for query.
or both
You could reduce the array size to match the space available.
It is not clear if an event could occur multiple times while your task was running, or if it is always 10ms between events.

This is more-or-less what Dtyree and Weeble have suggested, but an example implementation may help ( C code for illustration):
#include <stdint.h>
#include <stdbool.h>
#define HISTORY_LENGTH 100 // 1 second when called every 10ms
int rollingcount( bool event )
{
static uint8_t event_history[(HISTORY_LENGTH+7) / 8] ;
static int next_history_bit = 0 ;
static int event_count = 0 ;
// Get history byte index and bit mask
int history_index = next_history_bit >> 3 ; // ">> 3" is same as "/ 8" but often faster
uint8_t history_mask = 1 << (next_history_bit & 0x7) ; // "& 0x07" is same as "% 8" but often faster
// Get current bit value
bool history_bit = (event_history[history_index] & history_mask) != 0 ;
// If oldest history event is not the same as new event, adjust count
if( history_bit != event )
{
if( event )
{
// Increment count for 0->1
event_count++ ;
// Replace oldest bit with 1
event_history[history_index] |= history_mask ;
}
else
{
// decrement count for 1->0
event_count-- ;
// Replace oldest bit with 0
event_history[history_index] &= ~history_mask ;
}
}
// increment to oldest history bit
next_history_bit++ ;
if( next_history_bit >= HISTORY_LENGTH ) // Could use "next_history_bit %= HISTORY_COUNT" here, but may be expensive of some processors
{
next_history_bit = 0 ;
}
return event_count ;
}
For a 100 sample history, it requires 13 bytes plus two integers of statically allocated memory, I have used int for generality, but in this case uint8_t counters would suffice. In addition there are three stack variables, and again the use of int is not necessary if you need to really optimise memory use. So in total it is possible to use as little as 15 bytes plus three bytes of stack. The event argument may or may not be passed on the stack, then there is the function call return address, but again that depends on the calling convention of your compiler/processor.

You need some kind of list/queue etc, but a ringbuffer has probably the best performance.
You need to store 100 counters (1 for each time period of 10 ms during the last second) and a current counter.
Ringbuffer solution:
(I used pseudo code).
Create a counter_array of 100 counters (initially filled with 0's).
int[100] counter_array;
current_counter = 0
During the 10 ms cycle:
counter_array[current_counter] = 0;
current_counter++;
For every event:
counter_array[current_counter]++
To check the number of events during the last s, take the sum of counter_array

Can you afford an array of 100 booleans? Perhaps as a bit field? As long as you can afford the space cost, you can track the number of events in constant time:
Store:
A counter C, initially 0.
The array of booleans B, of size equal to the number of intervals you want to track, i.e. 100, initially all false.
An index I, initially 0.
Each interval:
read the boolean at B[i], and decrement C if it's true.
set the boolean at B[i] to true if the event occurred in this interval, false otherwise.
Increment C if the event occurred in this interval.
When I reaches 100, reset it to 0.
That way you at least avoid scanning the whole array every interval.
EDIT - Okay, so you want to track events over the last 3 minutes (180s, 18000 intervals). Using the above algorithm and cramming the booleans into a bit-field, that requires total storage:
2 byte unsigned integer for C
2 byte unsigned integer for I
2250 byte bit-field for B
That's pretty much unavoidable if you require to have a precise count of the number of events in the last 180.0 seconds at all times. I don't think it would be hard to prove that you need all of that information to be able to give an accurate answer at all times. However, if you could live with knowing only the number of events in the last 180 +/- 2 seconds, you could instead reduce your time resolution. Here's a detailed example, expanding on my comment below.
The above algorithm generalizes:
Store:
A counter C, initially 0.
The array of counters B, of size equal to the number of intervals you want to track, i.e. 100, initially all 0.
An index I, initially 0.
Each interval:
read B[i], and decrement C by that amount.
write the number of events that occurred this interval into B[i].
Increment C by the number of events that occurred this interval.
When I reaches the length of B, reset it to 0.
If you switch your interval to 2s, then in that time 0-200 events might occur. So each counter in the array could be a one-byte unsigned integer. You would have 90 such intervals over 3 minutes, so your array would need 90 elements = 90 bytes.
If you switch your interval to 150ms, then in that time 0-15 events might occur. If you are pressed for space, you could cram this into a half-byte unsigned integer. You would have 1200 such intervals over 3 minutes, so your array would need 1200 elements = 600 bytes.

Will the following work for you application?
A rolling event counter that increments every event.
In the routine that runs every 10ms, you compare the current event counter value with the event counter value stored the last time the routine ran.
That tells you how many events occurred during the 10ms window.

Related

Subtract a number's digits from the number until it reaches 0

Can anyone help me with some algorithm for this problem?
We have a big number (19 digits) and, in a loop, we subtract one of the digits of that number from the number itself.
We continue to do this until the number reaches zero. We want to calculate the minimum number of subtraction that makes a given number reach zero.
The algorithm must respond fast, for a 19 digits number (10^19), within two seconds. As an example, providing input of 36 will give 7:
1. 36 - 6 = 30
2. 30 - 3 = 27
3. 27 - 7 = 20
4. 20 - 2 = 18
5. 18 - 8 = 10
6. 10 - 1 = 9
7. 9 - 9 = 0
Thank you.

The minimum number of subtractions to reach zero makes this, I suspect, a very thorny problem, one that will require a great deal of backtracking potential solutions, making it possibly too expensive for your time limitations.
But the first thing you should do is a sanity check. Since the largest digit is a 9, a 19-digit number will require about 1018 subtractions to reach zero. Code up a simple program to continuously subtract 9 from 1019 until it becomes less than ten. If you can't do that within the two seconds, you're in trouble.
By way of example, the following program (a):
#include <stdio.h>
int main (int argc, char *argv[]) {
unsigned long long x = strtoull(argv[1], NULL, 10);
x /= 1000000000;
while (x > 9)
x -= 9;
return x;
}
when run with the argument 10000000000000000000 (1019), takes a second and a half clock time (and CPU time since it's all calculation) even at gcc insane optimisation level of -O3:
real 0m1.531s
user 0m1.528s
sys 0m0.000s
And that's with the one-billion divisor just before the while loop, meaning the full number of iterations would take about 48 years.
So a brute force method isn't going to help here, what you need is some serious mathematical analysis which probably means you should post a similar question over at https://math.stackexchange.com/ and let the math geniuses have a shot.
(a) If you're wondering why I'm getting the value from the user rather than using a constant of 10000000000000000000ULL, it's to prevent gcc from calculating it at compile time and turning it into something like:
mov $1, %eax
Ditto for the return x which will prevent it noticing I don't use the final value of x and hence optimise the loop out of existence altogether.

I don't have a solution that can solve 19 digit numbers in 2 seconds. Not even close. But I did implement a couple of algorithms (including a dynamic programming algorithm that solves for the optimum), and gained some insight that I believe is interesting.
Greedy Algorithm
As a baseline, I implemented a greedy algorithm that simply picks the largest digit in each step:
uint64_t countGreedy(uint64_t inputVal) {
uint64_t remVal = inputVal;
uint64_t nStep = 0;
while (remVal > 0) {
uint64_t digitVal = remVal;
uint_fast8_t maxDigit = 0;
while (digitVal > 0) {
uint64_t nextDigitVal = digitVal / 10;
uint_fast8_t digit = digitVal - nextDigitVal * 10;
if (digit > maxDigit) {
maxDigit = digit;
}
digitVal = nextDigitVal;
}
remVal -= maxDigit;
++nStep;
}
return nStep;
}
Dynamic Programming Algorithm
The idea for this is that we can calculate the optimum incrementally. For a given value, we pick a digit, which adds one step to the optimum number of steps for the value with the digit subtracted.
With the target function (optimum number of steps) for a given value named optSteps(val), and the digits of the value named d_i, the following relationship holds:
optSteps(val) = 1 + min(optSteps(val - d_i))
This can be implemented with a dynamic programming algorithm. Since d_i is at most 9, we only need the previous 9 values to build on. In my implementation, I keep a circular buffer of 10 values:
static uint64_t countDynamic(uint64_t inputVal) {
uint64_t minSteps[10] = {1, 1, 1, 1, 1, 1, 1, 1, 1, 1};
uint_fast8_t digit0 = 0;
for (uint64_t val = 10; val <= inputVal; ++val) {
digit0 = val % 10;
uint64_t digitVal = val;
uint64_t minPrevStep = 0;
bool prevStepSet = false;
while (digitVal > 0) {
uint64_t nextDigitVal = digitVal / 10;
uint_fast8_t digit = digitVal - nextDigitVal * 10;
if (digit > 0) {
uint64_t prevStep = 0;
if (digit > digit0) {
prevStep = minSteps[10 + digit0 - digit];
} else {
prevStep = minSteps[digit0 - digit];
}
if (!prevStepSet || prevStep < minPrevStep) {
minPrevStep = prevStep;
prevStepSet = true;
}
}
digitVal = nextDigitVal;
}
minSteps[digit0] = minPrevStep + 1;
}
return minSteps[digit0];
}
Comparison of Results
This may be considered a surprise: I ran both algorithms on all values up to 1,000,000. The results are absolutely identical. This suggests that the greedy algorithm actually calculates the optimum.
I don't have a formal proof that this is indeed true for all possible values. It intuitively kind of makes sense to me. If in any given step, you choose a smaller digit than the maximum, you compromise the immediate progress with the goal of getting into a more favorable situation that allows you to catch up and pass the greedy approach. But in all the scenarios I thought about, the situation after taking a sub-optimal step just does not get significantly more favorable. It might make the next step bigger, but that is at most enough to get even again.
Complexity
While both algorithms look linear in the size of the value, they also loop over all digits in the value. Since the number of digits corresponds to log(n), I believe the complexity is O(n * log(n)).
I think it's possible to make it linear by keeping counts of the frequency of each digit, and modifying them incrementally. But I doubt it would actually be faster. It requires more logic, and turns a loop over all digits in the value (which is in the range of 2-19 for the values we are looking at) into a fixed loop over 10 possible digits.
Runtimes
Not surprisingly, the greedy algorithm is faster to calculate a single value. For example, for value 1,000,000,000, the runtimes on my MacBook Pro are:
greedy: 3 seconds
dynamic: 36 seconds
On the other hand, the dynamic programming approach is obviously much faster at calculating all the values, since its incremental approach needs them as intermediate results anyway. For calculating all values from 10 to 1,000,000:
greedy: 19 minutes
dynamic: 0.03 seconds
As already shown in the runtimes above, the greedy algorithm gets about as high as 9 digit input values within the targeted runtime of 2 seconds. The implementations aren't really tuned, and it's certainly possible to squeeze out some more time, but it would be fractional improvements.
Ideas
As already explored in another answer, there's no chance of getting the result for 19 digit numbers in 2 seconds by subtracting digits one by one. Since we subtract at most 9 in each step, completing this for a value of 10^19 needs more than 10^18 steps. We mostly use computers that perform in the rough range of 10^9 operations/second, which suggests that it would take about 10^9 seconds.
Therefore, we need something that can take shortcuts. I can think of scenarios where that's possible, but haven't been able to generalize it to a full strategy so far.
For example, if your current value is 9999, you know that you can subtract 9 until you reach 9000. So you can calculate that you will make 112 steps ((9999 - 9000) / 9 + 1) where you subtract 9, which can be done in a few operations.

As said in comments already, and agreeing with #paxdiablo’s other answer, I’m not sure if there is an algorithm to find the ideal solution without some backtracking; and the size of the number and the time constraint might be tough as well.
A general consideration though: You might want to find a way to decide between always subtracting the highest digit (which will decrease your current number by the largest possible amount, obviously), and by looking at your current digits and subtracting which of those will give you the largest “new” digit.
Say, your current number only consists of digits between 0 and 5 – then you might be tempted to subtract the 5 to decrease your number by the highest possible value, and continue with the next step. If the last digit of your current number is 3 however, then you might want to subtract 4 instead – since that will give you 9 as new digit at the end of the number, instead of “only” 8 you would be getting if you subtracted 5.
Whereas if you have a 2 and two 9 in your digits already, and the last digit is a 1 – then you might want to subtract the 9 anyway, since you will be left with the second 9 in the result (at least in most cases; in some edge cases it might get obliterated from the result as well), so subtracting the 2 instead would not have the advantage of giving you a “high” 9 that you would otherwise not have in the next step, and would have the disadvantage of not lowering your number by as high an amount as subtracting the 9 would …
But every digit you subtract will not only affect the next step directly, but the following steps indirectly – so again, I doubt there is a way to always chose the ideal digit for the current step without any backtracking or similar measures.

Verilog: Minimal (hardware) algorithm for multiplying a binary input to its delayed form

I have a binary input in (1 bit serial input) which I want to delay by M clock pulses and then multiply (AND) the 2 signals. In other words, I want to evaluate the sum:
sum(in[n]*in[n+M])
where n is expressed in terms of number of clock pulses.
The most straightforward way is to store in a memory buffer in_dly the latest M samples of in. In Verilog, this would be something like:
always #(posedge clock ...)
...
in_dly[M-1:0] <= {in_dly[M-2:0], in};
if (in_dly[M-1] & in)
sum <= sum + 'd1;
...
While this works in theory, with large values of M (can be ~2000), the size of the buffer is not practical. However, I was thinking to take advantage of the fact that the input signal is 1 bit and it is expected to toggle only a few times (~1-10) during M samples.
This made me think of storing the toggle times from 2k*M to (2k+1)*M in an array a and from (2k+1)*M to (2k+2)*M in an array b (k is just an integer used to generalize the idea):
reg [10:0] a[0:9]; //2^11 > max(M)=2000 and "a" has max 10 elements
reg [10:0] b[0:9]; //same as "a"
Therefore, during M samples, in = 'b1 during intervals [a[1],a[2]], [a[3],a[4]], etc. Similarly, during the next M samples, the input is high during [b[1],b[2]], [b[3],b[4]], etc. Now, the sum is the "overlapping" of these intervals:
min(b[2],a[2])-max(b[1],a[1]), if b[2]>a[1] and b[1]<a[2]; 0 otherwise
Finally, the array b becomes the new array a and the next M samples are evaluated and stored into b. The process is repeated until the end of in.
Comparing this "optimized" method to the initial one, there is a significant gain in hardware: initially 2000 bits were stored, and now 220 bits are stored (for this example). However, the number is still large and not very practical..
I would greatly appreciate if somebody could suggest a more optimal (hardware-wise) way or a simpler way (algorithm-wise) of doing this operation. Thank you in advance!
Edit:
Thanks to Alexey's idea, I optimized the algorithm as follows:
Given a set of delays M[i] for i=1 to 10 with M[1]<M[2]<..<M[10], and an input binary array in, we need to compute the outputs:
y[i] = sum(in[n]*in[n+M[i]]) for n=1 to length(in).
We then define 2 empty arrays a[j] and b[j] with j=1,~5. Whenever in has a 0->1 transition, the smallest index empty element a[j] is "activated" and will increment at each clock cycle. Same goes for b[j] at 1->0 transitions. Basically, the pairs (a[j],b[j]) represent the portions of in equal to 1.
Whenever a[j] equals M[i], the sum y[i] will increment by 1 at each cycle while in = 1, until b[j] equals M[i]. Once a[j] equals M[10], a[j] is cleared. Same goes for b[j]. This is repeated until the end of in.
Based on the same numerical assumptions as the initial question, a total of 10 arrays (a and b) of 11 bits allow the computation of the 10 sums, corresponding to 10 different delays M[i]. This is almost 20 times better (in terms of resources used) than my initial approach. Any further optimization or idea is welcomed!

Try this:
make array A,
every time when in==1 get free A element and write M to it.
every clock decrement all non-zero A elements,
once any decremented element becomes zero, test in, if in==1 - sum++.
Edit: algorithm above intended for input like
- 00000000000010000000100010000000, while LLDinu realy needs
- 11111111111110000000011111000000, so here is modified algorithm:
make array (ring buffer) A,
every time when in toggles, get free A element and write M to it.
every clock decrement all non-zero A elements,
every clock test in, if in==1 and number of non-zero A elements is even - sum++.

how to read all 1's in an Array of 1's and 0's spread-ed all over the array randomly

I have an Array with 1 and 0 spread over the array randomly.
int arr[N] = {1,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,1,1,0,0,0,1....................N}
Now I want to retrive all the 1's in the array as fast as possible, but the condition is I should not loose the exact position(based on index) of the array , so sorting option not valid.
So the only option left is linear searching ie O(n) , is there anything better than this.
The main problem behind linear scan is , I need to run the scan even
for X times. So I feel I need to have some kind of other datastructure
which maintains this list once the first linear scan happens, so that
I need not to run the linear scan again and again.
Let me be clear about final expectations-
I just need to find the number of 1's in a certain range of array , precisely I need to find numbers of 1's in the array within range of 40-100. So this can be random range and I need to find the counts of 1 within that range. I can't do sum and all as I need to iterate over the array over and over again because of different range requirements

I'm surprised you considered sorting as a faster alternative to linear search.
If you don't know where the ones occur, then there is no better way than linear searching. Perhaps if you used bits or char datatypes you could do some optimizations, but it depends on how you want to use this.
The best optimization that you could do on this is to overcome branch prediction. Because each value is zero or one, you can use it to advance the index of the array that is used to store the one-indices.
Simple approach:
int end = 0;
int indices[N];
for( int i = 0; i < N; i++ )
{
if( arr[i] ) indices[end++] = i; // Slow due to branch prediction
}
Without branching:
int end = 0;
int indices[N];
for( int i = 0; i < N; i++ )
{
indices[end] = i;
end += arr[i];
}
[edit] I tested the above, and found the version without branching was almost 3 times faster (4.36s versus 11.88s for 20 repeats on a randomly populated 100-million element array).
Coming back here to post results, I see you have updated your requirements. What you want is really easy with a dynamic programming approach...
All you do is create a new array that is one element larger, which stores the number of ones from the beginning of the array up to (but not including) the current index.
arr : 1 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 1 1 0 0 0 1
count : 0 1 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 4 5 6 6 6 6 7
(I've offset arr above so it lines up better)
Now you can compute the number of 1s in any range in O(1) time. To compute the number of 1s between index A and B, you just do:
int num = count[B+1] - count[A];
Obviously you can still use the non-branch-prediction version to generate the counts initially. All this should give you a pretty good speedup over the naive approach of summing for every query:
int *count = new int[N+1];
int total = 0;
count[0] = 0;
for( int i = 0; i < N; i++ )
{
total += arr[i];
count[i+1] = total;
}
// to compute the ranged sum:
int range_sum( int *count, int a, int b )
{
if( b < a ) return range_sum(b,a);
return count[b+1] - count[a];
}

Well one time linear scanning is fine. Since you are looking for multiple scans across ranges of array I think that can be done in constant time. Here you go:
Scan the array and create a bitmap where key = key of array = sequence (1,2,3,4,5,6....).The value storedin bitmap would be a tuple<IsOne,cumulativeSum> where isOne is whether you have a one in there and cumulative Sum is addition of 1's as and wen you encounter them
Array = 1 1 0 0 1 0 1 1 1 0 1 0
Tuple: (1,1) (1,2) (0,2) (0,2) (1,3) (0,3) (1,4) (1,5) (1,6) (0,6) (1,7) (0,7)
CASE 1: When lower bound of cumulativeSum has a 0. Number of 1's [6,11] =
cumulativeSum at 11th position - cumulativeSum at 6th position = 7 - 3 = 4
CASE 2: When lower bound of cumulativeSum has a 1. Number of 1's [2,11] =
cumulativeSum at 11th position - cumulativeSum at 2nd position + 1 = 7-2+1 = 6
Step 1 is O(n)
Step 2 is 0(1)
Total complexity is linear no doubt but for your task where you have to work with the ranges several times the above Algorithm seems to be better if you have ample memory :)

Does it have to be a simple linear array data structure? Or can you create your own data structure which happens to have the desired properties, for which you're able to provide the required API, but whose implementation details can be hidden (encapsulated)?
If you can implement your own and if there is some guaranteed sparsity (to either 1s or 0s) then you might be able to offer better than linear performance. I see that you want to preserve (or be able to regenerate) the exact stream, so you'll have to store an array or bitmap or run-length encoding for that. (RLE will be useless if the stream is actually random rather than arbitrary but could be quite useful if there are significant sparsity or patterns with long strings of one or the other. For example a black&white raster of a bitmapped image is often a good candidate for RLE).
Let's say that your guaranteed that the stream will be sparse --- that no more than 10%, for example, of the bits will be 1s (or, conversely that more than 90% will be). If that's the case then you might model your solution on an RLE and maintain a count of all 1s (simply incremented as you set bits and decremented as you clear them). If there might be a need to quickly get the number of set bits for arbitrary ranges of these elements then instead of a single counter you can have a conveniently sized array of counters for partitions of the stream. (Conveniently-sized, in this case, means something which fits easily within memory, within your caches, or register sets, but which offers a reasonable trade off between computing a sum (all the partitions fully within the range) and the linear scan. The results for any arbitrary range is the sum of all the partitions fully enclosed by the range plus the results of linear scans for any fragments that are not aligned on your partition boundaries.
For a very, very, large stream you could even have a multi-tier "index" of partition sums --- traversing from the largest (most coarse) granularity down toward the "fragments" to either end (using the next layer of partition sums) and finishing with the linear search of only the small fragments.
Obviously such a structure represents trade offs between the complexity of building and maintaining the structure (inserting requires additional operations and, for an RLE, might be very expensive for anything other than appending/prepending) vs the expense of performing arbitrarily long linear search/increment scans.

If:
the purpose is to be able to find the number of 1s in the array at any time,
given that relatively few of the values in the array might change between one moment when you want to know the number and another moment, and
if you have to find the number of 1s in a changing array of n values m times,
... you can certainly do better than examining every cell in the array m times by using a caching strategy.
The first time you need the number of 1s, you certainly have to examine every cell, as others have pointed out. However, if you then store the number of 1s in a variable (say sum) and track changes to the array (by, for instance, requiring that all array updates occur through a specific update() function), every time a 0 is replaced in the array with a 1, the update() function can add 1 to sum and every time a 1 is replaced in the array with a 0, the update() function can subtract 1 from sum.
Thus, sum is always up-to-date after the first time that the number of 1s in the array is counted and there is no need for further counting.
(EDIT to take the updated question into account)
If the need is to return the number of 1s in a given range of the array, that can be done with a slightly more sophisticated caching strategy than the one I've just described.
You can keep a count of the 1s in each subset of the array and update the relevant subset count whenever a 0 is changed to a 1 or vice versa within that subset. Finding the total number of 1s in a given range within the array would then be a matter of adding the number of 1s in each subset that is fully contained within the range and then counting the number of 1s that are in the range but not in the subsets that have already been counted.
Depending on circumstances, it might be worthwhile to have a hierarchical arrangement in which (say) the number of 1s in the whole array is at the top of the hierarchy, the number of 1s in each 1/q th of the array is in the second level of the hierarchy, the number of 1s in each 1/(q^2) th of the array is in the third level of the hierarchy, etc. e.g. for q = 4, you would have the total number of 1s at the top, the number of 1s in each quarter of the array at the second level, the number of 1s in each sixteenth of the array at the third level, etc.

Are you using C (or derived language)? If so, can you control the encoding of your array? If, for example, you could use a bitmap to count. The nice thing about a bitmap, is that you can use a lookup table to sum the counts, though if your subrange ends aren't divisible by 8, you'll have to deal with end partial bytes specially, but the speedup will be significant.
If that's not the case, can you at least encode them as single bytes? In that case, you may be able to exploit sparseness if it exists (more specifically, the hope that there are often multi index swaths of zeros).
So for:
u8 input = {1,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,1,1,0,0,0,1....................N};
You can write something like (untested):
uint countBytesBy1FromTo(u8 *input, uint start, uint stop)
{ // function for counting one byte at a time, use with range of less than 4,
// use functions below for longer ranges
// assume it's just one's and zeros, otherwise we have to test/branch
uint sum;
u8 *end = input + stop;
for (u8 *each = input + start; each < end; each++)
sum += *each;
return sum;
}
countBytesBy8FromTo(u8 *input, uint start, uint stop)
{
u64 *chunks = (u64*)(input+start);
u64 *end = chunks + ((start - stop) >> 3);
uint sum = countBytesBy1FromTo((u8*)end, 0, stop - (u8*)end);
for (; chunks < end; chunks++)
{
if (*chunks)
{
sum += countBytesBy1FromTo((u8*)chunks, 0, 8);
}
}
}
The basic trick, is exploiting the ability to cast slices of your target array to single entities your language can look at in one swoop, and test by inference if ANY of the values of it are zeros, and then skip the whole block. The more zeros, the better it will work. In the case where your large cast integer always has at least one, this approach just adds overhead. You might find that using a u32 is better for your data. Or that adding a u32 test between the 1 and 8 helps. For datasets where zeros are much more common than ones, I've used this technique to great advantage.

Why is sorting invalid? You can clone the original array, sort the clone, and count and/or mark the locations of the 1s as needed.

Sorting 1 million 8-decimal-digit numbers with 1 MB of RAM

I have a computer with 1 MB of RAM and no other local storage. I must use it to accept 1 million 8-digit decimal numbers over a TCP connection, sort them, and then send the sorted list out over another TCP connection.
The list of numbers may contain duplicates, which I must not discard. The code will be placed in ROM, so I need not subtract the size of my code from the 1 MB. I already have code to drive the Ethernet port and handle TCP/IP connections, and it requires 2 KB for its state data, including a 1 KB buffer via which the code will read and write data. Is there a solution to this problem?
Sources Of Question And Answer:
slashdot.org
cleaton.net

There is one rather sneaky trick not mentioned here so far. We assume that you have no extra way to store data, but that is not strictly true.
One way around your problem is to do the following horrible thing, which should not be attempted by anyone under any circumstances: Use the network traffic to store data. And no, I don't mean NAS.
You can sort the numbers with only a few bytes of RAM in the following way:
First take 2 variables: COUNTER and VALUE.
First set all registers to 0;
Every time you receive an integer I, increment COUNTER and set VALUE to max(VALUE, I);
Then send an ICMP echo request packet with data set to I to the router. Erase I and repeat.
Every time you receive the returned ICMP packet, you simply extract the integer and send it back out again in another echo request. This produces a huge number of ICMP requests scuttling backward and forward containing the integers.
Once COUNTER reaches 1000000, you have all of the values stored in the incessant stream of ICMP requests, and VALUE now contains the maximum integer. Pick some threshold T >> 1000000. Set COUNTER to zero. Every time you receive an ICMP packet, increment COUNTER and send the contained integer I back out in another echo request, unless I=VALUE, in which case transmit it to the destination for the sorted integers. Once COUNTER=T, decrement VALUE by 1, reset COUNTER to zero and repeat. Once VALUE reaches zero you should have transmitted all integers in order from largest to smallest to the destination, and have only used about 47 bits of RAM for the two persistent variables (and whatever small amount you need for the temporary values).
I know this is horrible, and I know there can be all sorts of practical issues, but I thought it might give some of you a laugh or at least horrify you.

Here's some working C++ code which solves the problem.
Proof that the memory constraints are satisfied:
Editor: There is no proof of the maximum memory requirements offered by the author either in this post or in his blogs. Since the number of bits necessary to encode a value depends on the values previously encoded, such a proof is likely non-trivial. The author notes that the largest encoded size he could stumble upon empirically was 1011732, and chose the buffer size 1013000 arbitrarily.
typedef unsigned int u32;
namespace WorkArea
{
static const u32 circularSize = 253250;
u32 circular[circularSize] = { 0 }; // consumes 1013000 bytes
static const u32 stageSize = 8000;
u32 stage[stageSize]; // consumes 32000 bytes
...
Together, these two arrays take 1045000 bytes of storage. That leaves 1048576 - 1045000 - 2×1024 = 1528 bytes for remaining variables and stack space.
It runs in about 23 seconds on my Xeon W3520. You can verify that the program works using the following Python script, assuming a program name of sort1mb.exe.
from subprocess import *
import random
sequence = [random.randint(0, 99999999) for i in xrange(1000000)]
sorter = Popen('sort1mb.exe', stdin=PIPE, stdout=PIPE)
for value in sequence:
sorter.stdin.write('%08d\n' % value)
sorter.stdin.close()
result = [int(line) for line in sorter.stdout]
print('OK!' if result == sorted(sequence) else 'Error!')
A detailed explanation of the algorithm can be found in the following series of posts:
1MB Sorting Explained
Arithmetic Coding and the 1MB Sorting Problem
Arithmetic Encoding Using Fixed-Point Math

Please see the first correct answer or the later answer with arithmetic encoding. Below you may find some fun, but not a 100% bullet-proof solution.
This is quite an interesting task and here is an another solution. I hope somebody would find the result useful (or at least interesting).
Stage 1: Initial data structure, rough compression approach, basic results
Let's do some simple math: we have 1M (1048576 bytes) of RAM initially available to store 10^6 8 digit decimal numbers. [0;99999999]. So to store one number 27 bits are needed (taking the assumption that unsigned numbers will be used). Thus, to store a raw stream ~3.5M of RAM will be needed. Somebody already said it doesn't seem to be feasible, but I would say the task can be solved if the input is "good enough". Basically, the idea is to compress the input data with compression factor 0.29 or higher and do sorting in a proper manner.
Let's solve the compression issue first. There are some relevant tests already available:
http://www.theeggeadventure.com/wikimedia/index.php/Java_Data_Compression
"I ran a test to compress one million consecutive integers using
various forms of compression. The results are as follows:"
None 4000027
Deflate 2006803
Filtered 1391833
BZip2 427067
Lzma 255040
It looks like LZMA (Lempel–Ziv–Markov chain algorithm) is a good choice to continue with. I've prepared a simple PoC, but there are still some details to be highlighted:
Memory is limited so the idea is to presort numbers and use
compressed buckets (dynamic size) as temporary storage
It is easier to achieve a better compression factor with presorted
data, so there is a static buffer for each bucket (numbers from the buffer are to be sorted before LZMA)
Each bucket holds a specific range, so the final sort can be done for
each bucket separately
Bucket's size can be properly set, so there will be enough memory to
decompress stored data and do the final sort for each bucket separately
Please note, attached code is a POC, it can't be used as a final solution, it just demonstrates the idea of using several smaller buffers to store presorted numbers in some optimal way (possibly compressed). LZMA is not proposed as a final solution. It is used as a fastest possible way to introduce a compression to this PoC.
See the PoC code below (please note it just a demo, to compile it LZMA-Java will be needed):
public class MemorySortDemo {
static final int NUM_COUNT = 1000000;
static final int NUM_MAX = 100000000;
static final int BUCKETS = 5;
static final int DICT_SIZE = 16 * 1024; // LZMA dictionary size
static final int BUCKET_SIZE = 1024;
static final int BUFFER_SIZE = 10 * 1024;
static final int BUCKET_RANGE = NUM_MAX / BUCKETS;
static class Producer {
private Random random = new Random();
public int produce() { return random.nextInt(NUM_MAX); }
}
static class Bucket {
public int size, pointer;
public int[] buffer = new int[BUFFER_SIZE];
public ByteArrayOutputStream tempOut = new ByteArrayOutputStream();
public DataOutputStream tempDataOut = new DataOutputStream(tempOut);
public ByteArrayOutputStream compressedOut = new ByteArrayOutputStream();
public void submitBuffer() throws IOException {
Arrays.sort(buffer, 0, pointer);
for (int j = 0; j < pointer; j++) {
tempDataOut.writeInt(buffer[j]);
size++;
}
pointer = 0;
}
public void write(int value) throws IOException {
if (isBufferFull()) {
submitBuffer();
}
buffer[pointer++] = value;
}
public boolean isBufferFull() {
return pointer == BUFFER_SIZE;
}
public byte[] compressData() throws IOException {
tempDataOut.close();
return compress(tempOut.toByteArray());
}
private byte[] compress(byte[] input) throws IOException {
final BufferedInputStream in = new BufferedInputStream(new ByteArrayInputStream(input));
final DataOutputStream out = new DataOutputStream(new BufferedOutputStream(compressedOut));
final Encoder encoder = new Encoder();
encoder.setEndMarkerMode(true);
encoder.setNumFastBytes(0x20);
encoder.setDictionarySize(DICT_SIZE);
encoder.setMatchFinder(Encoder.EMatchFinderTypeBT4);
ByteArrayOutputStream encoderPrperties = new ByteArrayOutputStream();
encoder.writeCoderProperties(encoderPrperties);
encoderPrperties.flush();
encoderPrperties.close();
encoder.code(in, out, -1, -1, null);
out.flush();
out.close();
in.close();
return encoderPrperties.toByteArray();
}
public int[] decompress(byte[] properties) throws IOException {
InputStream in = new ByteArrayInputStream(compressedOut.toByteArray());
ByteArrayOutputStream data = new ByteArrayOutputStream(10 * 1024);
BufferedOutputStream out = new BufferedOutputStream(data);
Decoder decoder = new Decoder();
decoder.setDecoderProperties(properties);
decoder.code(in, out, 4 * size);
out.flush();
out.close();
in.close();
DataInputStream input = new DataInputStream(new ByteArrayInputStream(data.toByteArray()));
int[] array = new int[size];
for (int k = 0; k < size; k++) {
array[k] = input.readInt();
}
return array;
}
}
static class Sorter {
private Bucket[] bucket = new Bucket[BUCKETS];
public void doSort(Producer p, Consumer c) throws IOException {
for (int i = 0; i < bucket.length; i++) { // allocate buckets
bucket[i] = new Bucket();
}
for(int i=0; i< NUM_COUNT; i++) { // produce some data
int value = p.produce();
int bucketId = value/BUCKET_RANGE;
bucket[bucketId].write(value);
c.register(value);
}
for (int i = 0; i < bucket.length; i++) { // submit non-empty buffers
bucket[i].submitBuffer();
}
byte[] compressProperties = null;
for (int i = 0; i < bucket.length; i++) { // compress the data
compressProperties = bucket[i].compressData();
}
printStatistics();
for (int i = 0; i < bucket.length; i++) { // decode & sort buckets one by one
int[] array = bucket[i].decompress(compressProperties);
Arrays.sort(array);
for(int v : array) {
c.consume(v);
}
}
c.finalCheck();
}
public void printStatistics() {
int size = 0;
int sizeCompressed = 0;
for (int i = 0; i < BUCKETS; i++) {
int bucketSize = 4*bucket[i].size;
size += bucketSize;
sizeCompressed += bucket[i].compressedOut.size();
System.out.println(" bucket[" + i
+ "] contains: " + bucket[i].size
+ " numbers, compressed size: " + bucket[i].compressedOut.size()
+ String.format(" compression factor: %.2f", ((double)bucket[i].compressedOut.size())/bucketSize));
}
System.out.println(String.format("Data size: %.2fM",(double)size/(1014*1024))
+ String.format(" compressed %.2fM",(double)sizeCompressed/(1014*1024))
+ String.format(" compression factor %.2f",(double)sizeCompressed/size));
}
}
static class Consumer {
private Set<Integer> values = new HashSet<>();
int v = -1;
public void consume(int value) {
if(v < 0) v = value;
if(v > value) {
throw new IllegalArgumentException("Current value is greater than previous: " + v + " > " + value);
}else{
v = value;
values.remove(value);
}
}
public void register(int value) {
values.add(value);
}
public void finalCheck() {
System.out.println(values.size() > 0 ? "NOT OK: " + values.size() : "OK!");
}
}
public static void main(String[] args) throws IOException {
Producer p = new Producer();
Consumer c = new Consumer();
Sorter sorter = new Sorter();
sorter.doSort(p, c);
}
}
With random numbers it produces the following:
bucket[0] contains: 200357 numbers, compressed size: 353679 compression factor: 0.44
bucket[1] contains: 199465 numbers, compressed size: 352127 compression factor: 0.44
bucket[2] contains: 199682 numbers, compressed size: 352464 compression factor: 0.44
bucket[3] contains: 199949 numbers, compressed size: 352947 compression factor: 0.44
bucket[4] contains: 200547 numbers, compressed size: 353914 compression factor: 0.44
Data size: 3.85M compressed 1.70M compression factor 0.44
For a simple ascending sequence (one bucket is used) it produces:
bucket[0] contains: 1000000 numbers, compressed size: 256700 compression factor: 0.06
Data size: 3.85M compressed 0.25M compression factor 0.06
EDIT
Conclusion:
Don't try to fool the Nature
Use simpler compression with lower memory footprint
Some additional clues are really needed. Common bullet-proof solution does not seem to be feasible.
Stage 2: Enhanced compression, final conclusion
As was already mentioned in the previous section, any suitable compression technique can be used. So let's get rid of LZMA in favor of simpler and better (if possible) approach. There are a lot of good solutions including Arithmetic coding, Radix tree etc.
Anyway, simple but useful encoding scheme will be more illustrative than yet another external library, providing some nifty algorithm. The actual solution is pretty straightforward: since there are buckets with partially sorted data, deltas can be used instead of numbers.
Random input test shows slightly better results:
bucket[0] contains: 10103 numbers, compressed size: 13683 compression factor: 0.34
bucket[1] contains: 9885 numbers, compressed size: 13479 compression factor: 0.34
...
bucket[98] contains: 10026 numbers, compressed size: 13612 compression factor: 0.34
bucket[99] contains: 10058 numbers, compressed size: 13701 compression factor: 0.34
Data size: 3.85M compressed 1.31M compression factor 0.34
Sample code
public static void encode(int[] buffer, int length, BinaryOut output) {
short size = (short)(length & 0x7FFF);
output.write(size);
output.write(buffer[0]);
for(int i=1; i< size; i++) {
int next = buffer[i] - buffer[i-1];
int bits = getBinarySize(next);
int len = bits;
if(bits > 24) {
output.write(3, 2);
len = bits - 24;
}else if(bits > 16) {
output.write(2, 2);
len = bits-16;
}else if(bits > 8) {
output.write(1, 2);
len = bits - 8;
}else{
output.write(0, 2);
}
if (len > 0) {
if ((len % 2) > 0) {
len = len / 2;
output.write(len, 2);
output.write(false);
} else {
len = len / 2 - 1;
output.write(len, 2);
}
output.write(next, bits);
}
}
}
public static short decode(BinaryIn input, int[] buffer, int offset) {
short length = input.readShort();
int value = input.readInt();
buffer[offset] = value;
for (int i = 1; i < length; i++) {
int flag = input.readInt(2);
int bits;
int next = 0;
switch (flag) {
case 0:
bits = 2 * input.readInt(2) + 2;
next = input.readInt(bits);
break;
case 1:
bits = 8 + 2 * input.readInt(2) +2;
next = input.readInt(bits);
break;
case 2:
bits = 16 + 2 * input.readInt(2) +2;
next = input.readInt(bits);
break;
case 3:
bits = 24 + 2 * input.readInt(2) +2;
next = input.readInt(bits);
break;
}
buffer[offset + i] = buffer[offset + i - 1] + next;
}
return length;
}
Please note, this approach:
does not consume a lot of memory
works with streams
provides not so bad results
Full code can be found here, BinaryInput and BinaryOutput implementations can be found here
Final conclusion
No final conclusion :) Sometimes it is really good idea to move one level up and review the task from a meta-level point of view.
It was fun to spend some time with this task. BTW, there are a lot of interesting answers below. Thank you for your attention and happy codding.

A solution is possible only because of the difference between 1 megabyte and 1 million bytes. There are about 2 to the power 8093729.5 different ways to choose 1 million 8-digit numbers with duplicates allowed and order unimportant, so a machine with only 1 million bytes of RAM doesn't have enough states to represent all the possibilities. But 1M (less 2k for TCP/IP) is 1022*1024*8 = 8372224 bits, so a solution is possible.
Part 1, initial solution
This approach needs a little more than 1M, I'll refine it to fit into 1M later.
I'll store a compact sorted list of numbers in the range 0 to 99999999 as a sequence of sublists of 7-bit numbers. The first sublist holds numbers from 0 to 127, the second sublist holds numbers from 128 to 255, etc. 100000000/128 is exactly 781250, so 781250 such sublists will be needed.
Each sublist consists of a 2-bit sublist header followed by a sublist body. The sublist body takes up 7 bits per sublist entry. The sublists are all concatenated together, and the format makes it possible to tell where one sublist ends and the next begins. The total storage required for a fully populated list is 2*781250 + 7*1000000 = 8562500 bits, which is about 1.021 M-bytes.
The 4 possible sublist header values are:
00 Empty sublist, nothing follows.
01 Singleton, there is only one entry in the sublist and and next 7 bits hold it.
10 The sublist holds at least 2 distinct numbers. The entries are stored in non-decreasing order, except that the last entry is less than or equal to the first. This allows the end of the sublist to be identified. For example, the numbers 2,4,6 would be stored as (4,6,2). The numbers 2,2,3,4,4 would be stored as (2,3,4,4,2).
11 The sublist holds 2 or more repetitions of a single number. The next 7 bits give the number. Then come zero or more 7-bit entries with the value 1, followed by a 7-bit entry with the value 0. The length of the sublist body dictates the number of repetitions. For example, the numbers 12,12 would be stored as (12,0), the numbers 12,12,12 would be stored as (12,1,0), 12,12,12,12 would be (12,1,1,0) and so on.
I start off with an empty list, read a bunch of numbers in and store them as 32 bit integers, sort the new numbers in place (using heapsort, probably) and then merge them into a new compact sorted list. Repeat until there are no more numbers to read, then walk the compact list once more to generate the output.
The line below represents memory just before the start of the list merge operation. The "O"s are the region that hold the sorted 32-bit integers. The "X"s are the region that hold the old compact list. The "=" signs are the expansion room for the compact list, 7 bits for each integer in the "O"s. The "Z"s are other random overhead.
ZZZOOOOOOOOOOOOOOOOOOOOOOOOOO==========XXXXXXXXXXXXXXXXXXXXXXXXXX
The merge routine starts reading at the leftmost "O" and at the leftmost "X", and starts writing at the leftmost "=". The write pointer doesn't catch the compact list read pointer until all of the new integers are merged, because both pointers advance 2 bits for each sublist and 7 bits for each entry in the old compact list, and there is enough extra room for the 7-bit entries for the new numbers.
Part 2, cramming it into 1M
To Squeeze the solution above into 1M, I need to make the compact list format a bit more compact. I'll get rid of one of the sublist types, so that there will be just 3 different possible sublist header values. Then I can use "00", "01" and "1" as the sublist header values and save a few bits. The sublist types are:
A Empty sublist, nothing follows.
B Singleton, there is only one entry in the sublist and and next 7 bits hold it.
C The sublist holds at least 2 distinct numbers. The entries are stored in non-decreasing order, except that the last entry is less than or equal to the first. This allows the end of the sublist to be identified. For example, the numbers 2,4,6 would be stored as (4,6,2). The numbers 2,2,3,4,4 would be stored as (2,3,4,4,2).
D The sublist consists of 2 or more repetitions of a single number.
My 3 sublist header values will be "A", "B" and "C", so I need a way to represent D-type sublists.
Suppose I have the C-type sublist header followed by 3 entries, such as "C[17][101][58]". This can't be part of a valid C-type sublist as described above, since the third entry is less than the second but more than the first. I can use this type of construct to represent a D-type sublist. In bit terms, anywhere I have "C{00?????}{1??????}{01?????}" is an impossible C-type sublist. I'll use this to represent a sublist consisting of 3 or more repetitions of a single number. The first two 7-bit words encode the number (the "N" bits below) and are followed by zero or more {0100001} words followed by a {0100000} word.
For example, 3 repetitions: "C{00NNNNN}{1NN0000}{0100000}", 4 repetitions: "C{00NNNNN}{1NN0000}{0100001}{0100000}", and so on.
That just leaves lists that hold exactly 2 repetitions of a single number. I'll represent those with another impossible C-type sublist pattern: "C{0??????}{11?????}{10?????}". There's plenty of room for the 7 bits of the number in the first 2 words, but this pattern is longer than the sublist that it represents, which makes things a bit more complex. The five question-marks at the end can be considered not part of the pattern, so I have: "C{0NNNNNN}{11N????}10" as my pattern, with the number to be repeated stored in the "N"s. That's 2 bits too long.
I'll have to borrow 2 bits and pay them back from the 4 unused bits in this pattern. When reading, on encountering "C{0NNNNNN}{11N00AB}10", output 2 instances of the number in the "N"s, overwrite the "10" at the end with bits A and B, and rewind the read pointer by 2 bits. Destructive reads are ok for this algorithm, since each compact list gets walked only once.
When writing a sublist of 2 repetitions of a single number, write "C{0NNNNNN}11N00" and set the borrowed bits counter to 2. At every write where the borrowed bits counter is non-zero, it is decremented for each bit written and "10" is written when the counter hits zero. So the next 2 bits written will go into slots A and B, and then the "10" will get dropped onto the end.
With 3 sublist header values represented by "00", "01" and "1", I can assign "1" to the most popular sublist type. I'll need a small table to map sublist header values to sublist types, and I'll need an occurrence counter for each sublist type so that I know what the best sublist header mapping is.
The worst case minimal representation of a fully populated compact list occurs when all the sublist types are equally popular. In that case I save 1 bit for every 3 sublist headers, so the list size is 2*781250 + 7*1000000 - 781250/3 = 8302083.3 bits. Rounding up to a 32 bit word boundary, thats 8302112 bits, or 1037764 bytes.
1M minus the 2k for TCP/IP state and buffers is 1022*1024 = 1046528 bytes, leaving me 8764 bytes to play with.
But what about the process of changing the sublist header mapping ? In the memory map below, "Z" is random overhead, "=" is free space, "X" is the compact list.
ZZZ=====XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
Start reading at the leftmost "X" and start writing at the leftmost "=" and work right. When it's done the compact list will be a little shorter and it will be at the wrong end of memory:
ZZZXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX=======
So then I'll need to shunt it to the right:
ZZZ=======XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
In the header mapping change process, up to 1/3 of the sublist headers will be changing from 1-bit to 2-bit. In the worst case these will all be at the head of the list, so I'll need at least 781250/3 bits of free storage before I start, which takes me back to the memory requirements of the previous version of the compact list :(
To get around that, I'll split the 781250 sublists into 10 sublist groups of 78125 sublists each. Each group has its own independent sublist header mapping. Using the letters A to J for the groups:
ZZZ=====AAAAAABBCCCCDDDDDEEEFFFGGGGGGGGGGGHHIJJJJJJJJJJJJJJJJJJJJ
Each sublist group shrinks or stays the same during a sublist header mapping change:
ZZZ=====AAAAAABBCCCCDDDDDEEEFFFGGGGGGGGGGGHHIJJJJJJJJJJJJJJJJJJJJ
ZZZAAAAAA=====BBCCCCDDDDDEEEFFFGGGGGGGGGGGHHIJJJJJJJJJJJJJJJJJJJJ
ZZZAAAAAABB=====CCCCDDDDDEEEFFFGGGGGGGGGGGHHIJJJJJJJJJJJJJJJJJJJJ
ZZZAAAAAABBCCC======DDDDDEEEFFFGGGGGGGGGGGHHIJJJJJJJJJJJJJJJJJJJJ
ZZZAAAAAABBCCCDDDDD======EEEFFFGGGGGGGGGGGHHIJJJJJJJJJJJJJJJJJJJJ
ZZZAAAAAABBCCCDDDDDEEE======FFFGGGGGGGGGGGHHIJJJJJJJJJJJJJJJJJJJJ
ZZZAAAAAABBCCCDDDDDEEEFFF======GGGGGGGGGGGHHIJJJJJJJJJJJJJJJJJJJJ
ZZZAAAAAABBCCCDDDDDEEEFFFGGGGGGGGGG=======HHIJJJJJJJJJJJJJJJJJJJJ
ZZZAAAAAABBCCCDDDDDEEEFFFGGGGGGGGGGHH=======IJJJJJJJJJJJJJJJJJJJJ
ZZZAAAAAABBCCCDDDDDEEEFFFGGGGGGGGGGHHI=======JJJJJJJJJJJJJJJJJJJJ
ZZZAAAAAABBCCCDDDDDEEEFFFGGGGGGGGGGHHIJJJJJJJJJJJJJJJJJJJJ=======
ZZZ=======AAAAAABBCCCDDDDDEEEFFFGGGGGGGGGGHHIJJJJJJJJJJJJJJJJJJJJ
The worst case temporary expansion of a sublist group during a mapping change is 78125/3 = 26042 bits, under 4k. If I allow 4k plus the 1037764 bytes for a fully populated compact list, that leaves me 8764 - 4096 = 4668 bytes for the "Z"s in the memory map.
That should be plenty for the 10 sublist header mapping tables, 30 sublist header occurrence counts and the other few counters, pointers and small buffers I'll need, and space I've used without noticing, like stack space for function call return addresses and local variables.
Part 3, how long would it take to run?
With an empty compact list the 1-bit list header will be used for an empty sublist, and the starting size of the list will be 781250 bits. In the worst case the list grows 8 bits for each number added, so 32 + 8 = 40 bits of free space are needed for each of the 32-bit numbers to be placed at the top of the list buffer and then sorted and merged. In the worst case, changing the sublist header mapping results in a space usage of 2*781250 + 7*entries - 781250/3 bits.
With a policy of changing the sublist header mapping after every fifth merge once there are at least 800000 numbers in the list, a worst case run would involve a total of about 30M of compact list reading and writing activity.
Source:
http://nick.cleaton.net/ramsortsol.html

Gilmanov's answer is very wrong in its assumptions. It starts speculating based in a pointless measure of a million consecutive integers. That means no gaps. Those random gaps, however small, really makes it a poor idea.
Try it yourself. Get 1 million random 27-bit integers, sort them, compress with 7-Zip, xz, whatever LZMA you want. The result is over 1.5 MB. The premise on top is the compression of sequential numbers. Even delta encoding of that is over 1.1 MB. And never mind, this is using over 100 MB of RAM for compression. So even the compressed integers don't fit the problem and never mind run time RAM usage.
It's saddens me how people just upvote pretty graphics and rationalization.
#include <stdint.h>
#include <stdlib.h>
#include <time.h>
int32_t ints[1000000]; // Random 27-bit integers
int cmpi32(const void *a, const void *b) {
return ( *(int32_t *)a - *(int32_t *)b );
}
int main() {
int32_t *pi = ints; // Pointer to input ints (REPLACE W/ read from net)
// Fill pseudo-random integers of 27 bits
srand(time(NULL));
for (int i = 0; i < 1000000; i++)
ints[i] = rand() & ((1<<27) - 1); // Random 32 bits masked to 27 bits
qsort(ints, 1000000, sizeof (ints[0]), cmpi32); // Sort 1000000 int32s
// Now delta encode, optional, store differences to previous int
for (int i = 1, prev = ints[0]; i < 1000000; i++) {
ints[i] -= prev;
prev += ints[i];
}
FILE *f = fopen("ints.bin", "w");
fwrite(ints, 4, 1000000, f);
fclose(f);
exit(0);
}
Now compress ints.bin with LZMA...
$ xz -f --keep ints.bin # 100 MB RAM
$ 7z a ints.bin.7z ints.bin # 130 MB RAM
$ ls -lh ints.bin*
3.8M ints.bin
1.1M ints.bin.7z
1.2M ints.bin.xz

I think one way to think about this is from a combinatorics viewpoint: how many possible combinations of sorted number orderings are there? If we give the combination 0,0,0,....,0 the code 0, and 0,0,0,...,1 the code 1, and 99999999, 99999999, ... 99999999 the code N, what is N? In other words, how big is the result space?
Well, one way to think about this is noticing that this is a bijection of the problem of finding the number of monotonic paths in an N x M grid, where N = 1,000,000 and M = 100,000,000. In other words, if you have a grid that is 1,000,000 wide and 100,000,000 tall, how many shortest paths from the bottom left to the top right are there? Shortest paths of course require you only ever either move right or up (if you were to move down or left you would be undoing previously accomplished progress). To see how this is a bijection of our number sorting problem, observe the following:
You can imagine any horizontal leg in our path as a number in our ordering, where the Y location of the leg represents the value.
So if the path simply moves to the right all the way to the end, then jumps all the way to the top, that is equivalent to the ordering 0,0,0,...,0. if it instead begins by jumping all the way to the top and then moves to the right 1,000,000 times, that is equivalent to 99999999,99999999,..., 99999999. A path where it moves right once, then up once, then right one, then up once, etc to the very end (then necessarily jumps all the way to the top), is equivalent to 0,1,2,3,...,999999.
Luckily for us this problem has already been solved, such a grid has (N + M) Choose (M) paths:
(1,000,000 + 100,000,000) Choose (100,000,000) ~= 2.27 * 10^2436455
N thus equals 2.27 * 10^2436455, and so the code 0 represents 0,0,0,...,0 and the code 2.27 * 10^2436455 and some change represents 99999999,99999999,..., 99999999.
In order to store all the numbers from 0 to 2.27 * 10^2436455 you need lg2 (2.27 * 10^2436455) = 8.0937 * 10^6 bits.
1 megabyte = 8388608 bits > 8093700 bits
So it appears that we at least actually have enough room to store the result! Now of course the interesting bit is doing the sorting as the numbers stream in. Not sure the best approach to this is given we have 294908 bits remaining. I imagine an interesting technique would be to at each point assume that that is is the entire ordering, finding the code for that ordering, and then as you receive a new number going back and updating the previous code. Hand wave hand wave.

My suggestions here owe a lot to Dan's solution
First off I assume the solution must handle all possible input lists. I think the popular answers do not make this assumption (which IMO is a huge mistake).
It is known that no form of lossless compression will reduce the size of all inputs.
All the popular answers assume they will be able to apply compression effective enough to allow them extra space. In fact, a chunk of extra space large enough to hold some portion of their partially completed list in an uncompressed form and allow them to perform their sorting operations. This is just a bad assumption.
For such a solution, anyone with knowledge of how they do their compression will be able to design some input data that does not compress well for this scheme, and the "solution" will most likely then break due to running out of space.
Instead I take a mathematical approach. Our possible outputs are all the lists of length LEN consisting of elements in the range 0..MAX. Here the LEN is 1,000,000 and our MAX is 100,000,000.
For arbitrary LEN and MAX, the amount of bits needed to encode this state is:
Log2(MAX Multichoose LEN)
So for our numbers, once we have completed recieving and sorting, we will need at least Log2(100,000,000 MC 1,000,000) bits to store our result in a way that can uniquely distinguish all possible outputs.
This is ~= 988kb. So we actually have enough space to hold our result. From this point of view, it is possible.
[Deleted pointless rambling now that better examples exist...]
Best answer is here.
Another good answer is here and basically uses insertion sort as the function to expand the list by one element (buffers a few elements and pre-sorts, to allow insertion of more than one at a time, saves a bit of time). uses a nice compact state encoding too, buckets of seven bit deltas

Suppose this task is possible. Just prior to output, there will be an in-memory representation of the million sorted numbers. How many different such representations are there? Since there may be repeated numbers we can't use nCr (choose), but there is an operation called multichoose that works on multisets.
There are 2.2e2436455 ways to choose a million numbers in range 0..99,999,999.
That requires 8,093,730 bits to represent every possible combination, or 1,011,717 bytes.
So theoretically it may be possible, if you can come up with a sane (enough) representation of the sorted list of numbers. For example, an insane representation might require a 10MB lookup table or thousands of lines of code.
However, if "1M RAM" means one million bytes, then clearly there is not enough space. The fact that 5% more memory makes it theoretically possible suggests to me that the representation will have to be VERY efficient and probably not sane.

(My original answer was wrong, sorry for the bad math, see below the break.)
How about this?
The first 27 bits store the lowest number you have seen, then the difference to the next number seen, encoded as follows: 5 bits to store the number of bits used in storing the difference, then the difference. Use 00000 to indicate that you saw that number again.
This works because as more numbers are inserted, the average difference between numbers goes down, so you use less bits to store the difference as you add more numbers. I believe this is called a delta list.
The worst case I can think of is all numbers evenly spaced (by 100), e.g. Assuming 0 is the first number:
000000000000000000000000000 00111 1100100
^^^^^^^^^^^^^
a million times
27 + 1,000,000 * (5+7) bits = ~ 427k
Reddit to the rescue!
If all you had to do was sort them, this problem would be easy. It takes 122k (1 million bits) to store which numbers you have seen (0th bit on if 0 was seen, 2300th bit on if 2300 was seen, etc.
You read the numbers, store them in the bit field, and then shift the bits out while keeping a count.
BUT, you have to remember how many you have seen. I was inspired by the sublist answer above to come up with this scheme:
Instead of using one bit, use either 2 or 27 bits:
00 means you did not see the number.
01 means you saw it once
1 means you saw it, and the next 26 bits are the count of how many times.
I think this works: if there are no duplicates, you have a 244k list.
In the worst case you see each number twice (if you see one number three times, it shortens the rest of the list for you), that means you have seen 50,000 more than once, and you have seen 950,000 items 0 or 1 times.
50,000 * 27 + 950,000 * 2 = 396.7k.
You can make further improvements if you use the following encoding:
0 means you did not see the number
10 means you saw it once
11 is how you keep count
Which will, on average, result in 280.7k of storage.
EDIT: my Sunday morning math was wrong.
The worst case is we see 500,000 numbers twice, so the math becomes:
500,000 *27 + 500,000 *2 = 1.77M
The alternate encoding results in an average storage of
500,000 * 27 + 500,000 = 1.70M
: (

There is one solution to this problem across all possible inputs. Cheat.
Read m values over TCP, where m is near the max that can be sorted in memory, maybe n/4.
Sort the 250,000 (or so) numbers and output them.
Repeat for the other 3 quarters.
Let the receiver merge the 4 lists of numbers it has received as it processes them. (It's not much slower than using a single list.)

What kind of computer are you using? It may not have any other "normal" local storage, but does it have video RAM, for example? 1 megapixel x 32 bits per pixel (say) is pretty close to your required data input size.
(I largely ask in memory of the old Acorn RISC PC, which could 'borrow' VRAM to expand the available system RAM, if you chose a low resolution or low colour-depth screen mode!). This was rather useful on a machine with only a few MB of normal RAM.

I would try a Radix Tree. If you could store the data in a tree, you could then do an in-order traverse to transmit the data.
I'm not sure you could fit this into 1MB, but I think it's worth a try.

A radix tree representation would come close to handling this problem, since the radix tree takes advantage of "prefix compression". But it's hard to conceive of a radix tree representation that could represent a single node in one byte -- two is probably about the limit.
But, regardless of how the data is represented, once it is sorted it can be stored in prefix-compressed form, where the numbers 10, 11, and 12 would be represented by, say 001b, 001b, 001b, indicating an increment of 1 from the previous number. Perhaps, then, 10101b would represent an increment of 5, 1101001b an increment of 9, etc.

There are 10^6 values in a range of 10^8, so there's one value per hundred code points on average. Store the distance from the Nth point to the (N+1)th. Duplicate values have a skip of 0. This means that the skip needs an average of just under 7 bits to store, so a million of them will happily fit into our 8 million bits of storage.
These skips need to be encoded into a bitstream, say by Huffman encoding. Insertion is by iterating through the bitstream and rewriting after the new value. Output by iterating through and writing out the implied values. For practicality, it probably wants to be done as, say, 10^4 lists covering 10^4 code points (and an average of 100 values) each.
A good Huffman tree for random data can be built a priori by assuming a Poisson distribution (mean=variance=100) on the length of the skips, but real statistics can be kept on the input and used to generate an optimal tree to deal with pathological cases.

I have a computer with 1M of RAM and no other local storage
Another way to cheat: you could use non-local (networked) storage instead (your question does not preclude this) and call a networked service that could use straightforward disk-based mergesort (or just enough RAM to sort in-memory, since you only need to accept 1M numbers), without needing the (admittedly extremely ingenious) solutions already given.
This might be cheating, but it's not clear whether you are looking for a solution to a real-world problem, or a puzzle that invites bending of the rules... if the latter, then a simple cheat may get better results than a complex but "genuine" solution (which as others have pointed out, can only work for compressible inputs).

Google's (bad) approach, from HN thread. Store RLE-style counts.
Your initial data structure is '99999999:0' (all zeros, haven't seen any numbers) and then lets say you see the number 3,866,344 so your data structure becomes '3866343:0,1:1,96133654:0' as you can see the numbers will always alternate between number of zero bits and number of '1' bits so you can just assume the odd numbers represent 0 bits and the even numbers 1 bits. This becomes (3866343,1,96133654)
Their problem doesn't seem to cover duplicates, but let's say they use "0:1" for duplicates.
Big problem #1: insertions for 1M integers would take ages.
Big problem #2: like all plain delta encoding solutions, some distributions can't be covered this way. For example, 1m integers with distances 0:99 (e.g. +99 each one). Now think the same but with random distance in the range of 0:99. (Note: 99999999/1000000 = 99.99)
Google's approach is both unworthy (slow) and incorrect. But to their defense, their problem might have been slightly different.

I think the solution is to combine techniques from video encoding, namely the discrete cosine transformation. In digital video, rather recording the changing the brightness or colour of video as regular values such as 110 112 115 116, each is subtracted from the last (similar to run length encoding). 110 112 115 116 becomes 110 2 3 1. The values, 2 3 1 require less bits than the originals.
So lets say we create a list of the input values as they arrive on the socket. We are storing in each element, not the value, but the offset of the one before it. We sort as we go, so the offsets are only going to be positive. But the offset could be 8 decimal digits wide which this fits in 3 bytes. Each element can't be 3 bytes, so we need to pack these. We could use the top bit of each byte as a "continue bit", indicating that the next byte is part of the number and the lower 7 bits of each byte need to be combined. zero is valid for duplicates.
As the list fills up, the numbers should be get closer together, meaning on average only 1 byte is used to determine the distance to the next value. 7 bits of value and 1 bit of offset if convenient, but there may be a sweet spot that requires less than 8 bits for a "continue" value.
Anyway, I did some experiment. I use a random number generator and I can fit a million sorted 8 digit decimal numbers into about 1279000 bytes. The average space between each number is consistently 99...
public class Test {
public static void main(String[] args) throws IOException {
// 1 million values
int[] values = new int[1000000];
// create random values up to 8 digits lrong
Random random = new Random();
for (int x=0;x<values.length;x++) {
values[x] = random.nextInt(100000000);
}
Arrays.sort(values);
ByteArrayOutputStream baos = new ByteArrayOutputStream();
int av = 0;
writeCompact(baos, values[0]); // first value
for (int x=1;x<values.length;x++) {
int v = values[x] - values[x-1]; // difference
av += v;
System.out.println(values[x] + " diff " + v);
writeCompact(baos, v);
}
System.out.println("Average offset " + (av/values.length));
System.out.println("Fits in " + baos.toByteArray().length);
}
public static void writeCompact(OutputStream os, long value) throws IOException {
do {
int b = (int) value & 0x7f;
value = (value & 0x7fffffffffffffffl) >> 7;
os.write(value == 0 ? b : (b | 0x80));
} while (value != 0);
}
}

We could play with the networking stack to send the numbers in sorted order before we have all the numbers. If you send 1M of data, TCP/IP will break it into 1500 byte packets and stream them in order to the target. Each packet will be given a sequence number.
We can do this by hand. Just before we fill our RAM we can sort what we have and send the list to our target but leave holes in our sequence around each number. Then process the 2nd 1/2 of the numbers the same way using those holes in the sequence.
The networking stack on the far end will assemble the resulting data stream in order of sequence before handing it up to the application.
It's using the network to perform a merge sort. This is a total hack, but I was inspired by the other networking hack listed previously.

I would exploit the retransmission behaviour of TCP.
Make the TCP component create a large receive window.
Receive some amount of packets without sending an ACK for them.
Process those in passes creating some (prefix) compressed data structure
Send duplicate ack for last packet that is not needed anymore/wait for retransmission timeout
Goto 2
All packets were accepted
This assumes some kind of benefit of buckets or multiple passes.
Probably by sorting the batches/buckets and merging them. -> radix trees
Use this technique to accept and sort the first 80% then read the last 20%, verify that the last 20% do not contain numbers that would land in the first 20% of the lowest numbers. Then send the 20% lowest numbers, remove from memory, accept the remaining 20% of new numbers and merge.**

To represent the sorted array one can just store the first element and the difference between adjacent elements. In this way we are concerned with encoding 10^6 elements that can sum up to at most 10^8. Let's call this D. To encode the elements of D one can use a Huffman code. The dictionary for the Huffman code can be created on the go and the array updated every time a new item is inserted in the sorted array (insertion sort). Note that when the dictionary changes because of a new item the whole array should be updated to match the new encoding.
The average number of bits for encoding each element of D is maximized if we have equal number of each unique element. Say elements d1, d2, ..., dN in D each appear F times. In that case (in worst case we have both 0 and 10^8 in input sequence) we have
sum(1<=i<=N) F. di = 10^8
where
sum(1<=i<=N) F = 10^6, or F=10^6/N and the normalized frequency will be p= F/10^=1/N
The average number of bits will be -log2(1/P) = log2(N). Under these circumstances we should find a case that maximizes N. This happens if we have consecutive numbers for di starting from 0, or, di= i-1, therefore
10^8=sum(1<=i<=N) F. di = sum(1<=i<=N) (10^6/N) (i-1) = (10^6/N) N (N-1)/2
i.e.
N <= 201. And for this case average number of bits is log2(201)=7.6511 which means we will need around 1 byte per input element for saving the sorted array. Note that this doesn't mean D in general cannot have more than 201 elements. It just sows that if elements of D are uniformly distributed, it cannot have more than 201 unique values.

Here is a generalized solution to this kind of problem:
General procedure
The taken approach is as follows. The algorithm operates on a single buffer of 32-bit words. It performs the following procedure in a loop:
We start with a buffer filled with compressed data from the last iteration. The buffer looks like this
|compressed sorted|empty|
Calculate the maximum amount of numbers that can be stored in this buffer, both compressed and uncompressed. Split the buffer into these two sections, beginning with the space for compressed data, ending with the uncompressed data. The buffer looks like
|compressed sorted|empty|empty|
Fill the uncompressed section with numbers to be sorted. The buffer looks like
|compressed sorted|empty|uncompressed unsorted|
Sort the new numbers with an in-place sort. The buffer looks like
|compressed sorted|empty|uncompressed sorted|
Right-align any already compressed data from the previous iteration in the compressed section. At this point the buffer is partitioned
|empty|compressed sorted|uncompressed sorted|
Perform a streaming decompression-recompression on the compressed section, merging in the sorted data in the uncompressed section. The old compressed section is consumed as the new compressed section grows. The buffer looks like
|compressed sorted|empty|
This procedure is performed until all numbers have been sorted.
Compression
This algorithm of course only works when it's possible to calculate the final compressed size of the new sorting buffer before actually knowing what will actually be compressed. Next to that, the compression algorithm needs to be good enough to solve the actual problem.
The used approach uses three steps. First, the algorithm will always store sorted sequences, therefore we can instead store purely the differences between consecutive entries. Each difference is in the range [0, 99999999].
These differences are then encoded as a unary bitstream. A 1 in this stream means "Add 1 to the accumulator, A 0 means "Emit the accumulator as an entry, and reset". So difference N will be represented by N 1's and one 0.
The sum of all differences will approach the maximum value that the algorithm supports, and the count of all differences will approach the amount of values inserted in the algorithm. This means we expect the stream to, at the end, contain max value 1's and count 0's. This allows us to calculate the expected probability of a 0 and 1 in the stream. Namely, the probability of a 0 is count/(count+maxval) and the probability of a 1 is maxval/(count+maxval).
We use these probabilities to define an arithmetic coding model over this bitstream. This arithmetic code will encode exactly this amounts of 1's and 0's in optimal space. We can calculate the space used by this model for any intermediate bitstream as: bits = encoded * log2(1 + amount / maxval) + maxval * log2(1 + maxval / amount). To calculate the total required space for the algorithm, set encoded equal to amount.
To not require a ridiculous amount of iterations, a small overhead can be added to the buffer. This will ensure that the algorithm will at least operate on the amount of numbers that fit in this overhead, as by far the largest time cost of the algorithm is the arithmetic coding compression and decompression each cycle.
Next to that, some overhead is necessary to store bookkeeping data and to handle slight inaccuracies in the fixed-point approximation of the arithmetic coding algorithm, but in total the algorithm is able to fit in 1MiB of space even with an extra buffer that can contain 8000 numbers, for a total of 1043916 bytes of space.
Optimality
Outside of reducing the (small) overhead of the algorithm it should be theoretically impossible to get a smaller result. To just contain the entropy of the final result, 1011717 bytes would be necessary. If we subtract the extra buffer added for efficiency this algorithm used 1011916 bytes to store the final result + overhead.

If the input stream could be received few times this would be much
easier (no information about that, idea and time-performance problem).
Then, we could count the decimal values. With counted values it would be
easy to make the output stream. Compress by counting the values. It
depends what would be in the input stream.

If the input stream could be received few times this would be much easier (no info about that, idea and time-performance problem). Then, we could count the decimal values. With counted values it would be easy to make the output stream. Compress by counting the values.
It depends what would be in the input stream.

Sorting is a secondary problem here. As other said, just storing the integers is hard, and cannot work on all inputs, since 27 bits per number would be necessary.
My take on this is: store only the differences between the consecutive (sorted) integers, as they will be most likely small. Then use a compression scheme, e.g. with 2 additional bits per input number, to encode how many bits the number is stored on.
Something like:
00 -> 5 bits
01 -> 11 bits
10 -> 19 bits
11 -> 27 bits
It should be possible to store a fair number of possible input lists within the given memory constraint. The maths of how to pick the compression scheme to have it work on the maximum number of inputs, are beyond me.
I hope you may be able to exploit domain-specific knowledge of your input to find a good enough integer compression scheme based on this.
Oh and then, you do an insertion sort on that sorted list as you receive data.

Now aiming to an actual solution, covering all possible cases of input in the 8 digit range with only 1MB of RAM. NOTE: work in progress, tomorrow will continue. Using arithmetic coding of deltas of the sorted ints, worst case for 1M sorted ints would cost about 7bits per entry (since 99999999/1000000 is 99, and log2(99) is almost 7 bits).
But you need the 1m integers sorted to get to 7 or 8 bits! Shorter series would have bigger deltas, therefore more bits per element.
I'm working on taking as many as possible and compressing (almost) in-place. First batch of close to 250K ints would need about 9 bits each at best. So result would take about 275KB. Repeat with remaining free memory a few times. Then decompress-merge-in-place-compress those compressed chunks. This is quite hard, but possible. I think.
The merged lists would get closer and closer to the 7bit per integer target. But I don't know how many iterations it would take of the merge loop. Perhaps 3.
But the imprecision of the arithmetic coding implementation might make it impossible. If this problem is possible at all, it would be extremely tight.
Any volunteers?

You just need to store the differences between the numbers in sequence, and use an encoding to compress these sequence numbers. We have 2^23 bits. We shall divide it into 6bit chunks, and let the last bit indicate whether the number extends to another 6 bits (5bits plus extending chunk).
Thus, 000010 is 1, and 000100 is 2. 000001100000 is 128. Now, we consider the worst cast in representing differences in sequence of a numbers up to 10,000,000. There can be 10,000,000/2^5 differences greater than 2^5, 10,000,000/2^10 differences greater than 2^10, and 10,000,000/2^15 differences greater than 2^15, etc.
So, we add how many bits it will take to represent our the sequence. We have 1,000,000*6 + roundup(10,000,000/2^5)*6+roundup(10,000,000/2^10)*6+roundup(10,000,000/2^15)*6+roundup(10,000,000/2^20)*4=7935479.
2^24 = 8388608. Since 8388608 > 7935479, we should easily have enough memory. We will probably need another little bit of memory to store the sum of where are when we insert new numbers. We then go through the sequence, and find where to insert our new number, decrease the next difference if necessary, and shift everything after it right.

If we don't know anything about those numbers, we are limited by the following constraints:
we need to load all numbers before we can sort them them,
the set of numbers is not compressible.
If these assumptions hold, there is no way to carry out your task, as you will need at least 26,575,425 bits of storage (3,321,929 bytes).
What can you tell us about your data ?

The trick is to represent the algorithms state, which is an integer multi-set, as a compressed stream of "increment counter"="+" and "output counter"="!" characters. For example, the set {0,3,3,4} would be represented as "!+++!!+!", followed by any number of "+" characters. To modify the multi-set you stream out the characters, keeping only a constant amount decompressed at a time, and make changes inplace before streaming them back in compressed form.
Details
We know there are exactly 10^6 numbers in the final set, so there are at most 10^6 "!" characters. We also know that our range has size 10^8, meaning there are at most 10^8 "+" characters. The number of ways we can arrange 10^6 "!"s amongst 10^8 "+"s is (10^8 + 10^6) choose 10^6, and so specifying some particular arrangement takes ~0.965 MiB` of data. That'll be a tight fit.
We can treat each character as independent without exceeding our quota. There are exactly 100 times more "+" characters than "!" characters, which simplifies to 100:1 odds of each character being a "+" if we forget that they are dependent. Odds of 100:101 corresponds to ~0.08 bits per character, for an almost identical total of ~0.965 MiB (ignoring the dependency has a cost of only ~12 bits in this case!).
The simplest technique for storing independent characters with known prior probability is Huffman coding. Note that we need an impractically large tree (A huffman tree for blocks of 10 characters has an average cost per block of about 2.4 bits, for a total of ~2.9 Mib. A huffman tree for blocks of 20 characters has an average cost per block of about 3 bits, which is a total of ~1.8 MiB. We're probably going to need a block of size on the order of a hundred, implying more nodes in our tree than all the computer equipment that has ever existed can store.). However, ROM is technically "free" according to the problem and practical solutions that take advantage of the regularity in the tree will look essentially the same.
Pseudo-code
Have a sufficiently large huffman tree (or similar block-by-block compression data) stored in ROM
Start with a compressed string of 10^8 "+" characters.
To insert the number N, stream out the compressed string until N "+" characters have gone past then insert a "!". Stream the recompressed string back over the previous one as you go, keeping a constant amount of buffered blocks to avoid over/under-runs.
Repeat one million times: [input, stream decompress>insert>compress], then decompress to output

We have 1 MB - 3 KB RAM = 2^23 - 3*2^13 bits = 8388608 - 24576 = 8364032 bits available.
We are given 10^6 numbers in a 10^8 range. This gives an average gap of ~100 < 2^7 = 128
Let's first consider the simpler problem of fairly evenly spaced numbers when all gaps are < 128. This is easy. Just store the first number and the 7-bit gaps:
(27 bits) + 10^6 7-bit gap numbers = 7000027 bits required
Note repeated numbers have gaps of 0.
But what if we have gaps larger than 127?
OK, let's say a gap size < 127 is represented directly, but a gap size of 127 is followed by a continuous 8-bit encoding for the actual gap length:
10xxxxxx xxxxxxxx = 127 .. 16,383
110xxxxx xxxxxxxx xxxxxxxx = 16384 .. 2,097,151
etc.
Note this number representation describes its own length so we know when the next gap number starts.
With just small gaps < 127, this still requires 7000027 bits.
There can be up to (10^8)/(2^7) = 781250 23-bit gap number, requiring an extra 16*781,250 = 12,500,000 bits which is too much. We need a more compact and slowly increasing representation of gaps.
The average gap size is 100 so if we reorder them as
[100, 99, 101, 98, 102, ..., 2, 198, 1, 199, 0, 200, 201, 202, ...]
and index this with a dense binary Fibonacci base encoding with no pairs of zeros (for example, 11011=8+5+2+1=16) with numbers delimited by '00' then I think we can keep the gap representation short enough, but it needs more analysis.

While receiving the stream do these steps.
1st set some reasonable chunk size
Pseudo Code idea:
The first step would be to find all the duplicates and stick them in a dictionary with its count and remove them.
The third step would be to place number that exist in sequence of their algorithmic steps and place them in counters special dictionaries with the first number and their step like n, n+1..., n+2, 2n, 2n+1, 2n+2...
Begin to compress in chunks some reasonable ranges of number like every 1000 or ever 10000 the remaining numbers that appear less often to repeat.
Uncompress that range if a number is found and add it to the range and leave it uncompressed for a while longer.
Otherwise just add that number to a byte[chunkSize]
Continue the first 4 steps while receiving the stream. The final step would be to either fail if you exceeded memory or start outputting the result once all the data is collected by beginning to sort the ranges and spit out the results in order and uncompressing those in order that need to be uncompressed and sort them when you get to them.

Generate an integer that is not among four billion given ones

I have been given this interview question:
Given an input file with four billion integers, provide an algorithm to generate an integer which is not contained in the file. Assume you have 1 GB memory. Follow up with what you would do if you have only 10 MB of memory.
My analysis:
The size of the file is 4×109×4 bytes = 16 GB.
We can do external sorting, thus letting us know the range of the integers.
My question is what is the best way to detect the missing integer in the sorted big integer sets?
My understanding (after reading all the answers):
Assuming we are talking about 32-bit integers, there are 232 = 4*109 distinct integers.
Case 1: we have 1 GB = 1 * 109 * 8 bits = 8 billion bits memory.
Solution:
If we use one bit representing one distinct integer, it is enough. we don't need sort.
Implementation:
int radix = 8;
byte[] bitfield = new byte[0xffffffff/radix];
void F() throws FileNotFoundException{
Scanner in = new Scanner(new FileReader("a.txt"));
while(in.hasNextInt()){
int n = in.nextInt();
bitfield[n/radix] |= (1 << (n%radix));
}
for(int i = 0; i< bitfield.lenght; i++){
for(int j =0; j<radix; j++){
if( (bitfield[i] & (1<<j)) == 0) System.out.print(i*radix+j);
}
}
}
Case 2: 10 MB memory = 10 * 106 * 8 bits = 80 million bits
Solution:
For all possible 16-bit prefixes, there are 216 number of
integers = 65536, we need 216 * 4 * 8 = 2 million bits. We need build 65536 buckets. For each bucket, we need 4 bytes holding all possibilities because the worst case is all the 4 billion integers belong to the same bucket.
Build the counter of each bucket through the first pass through the file.
Scan the buckets, find the first one who has less than 65536 hit.
Build new buckets whose high 16-bit prefixes are we found in step2
through second pass of the file
Scan the buckets built in step3, find the first bucket which doesnt
have a hit.
The code is very similar to above one.
Conclusion:
We decrease memory through increasing file pass.
A clarification for those arriving late: The question, as asked, does not say that there is exactly one integer that is not contained in the file—at least that's not how most people interpret it. Many comments in the comment thread are about that variation of the task, though. Unfortunately the comment that introduced it to the comment thread was later deleted by its author, so now it looks like the orphaned replies to it just misunderstood everything. It's very confusing, sorry.

Assuming that "integer" means 32 bits: 10 MB of space is more than enough for you to count how many numbers there are in the input file with any given 16-bit prefix, for all possible 16-bit prefixes in one pass through the input file. At least one of the buckets will have be hit less than 216 times. Do a second pass to find of which of the possible numbers in that bucket are used already.
If it means more than 32 bits, but still of bounded size: Do as above, ignoring all input numbers that happen to fall outside the (signed or unsigned; your choice) 32-bit range.
If "integer" means mathematical integer: Read through the input once and keep track of the largest number length of the longest number you've ever seen. When you're done, output the maximum plus one a random number that has one more digit. (One of the numbers in the file may be a bignum that takes more than 10 MB to represent exactly, but if the input is a file, then you can at least represent the length of anything that fits in it).

Statistically informed algorithms solve this problem using fewer passes than deterministic approaches.
If very large integers are allowed then one can generate a number that is likely to be unique in O(1) time. A pseudo-random 128-bit integer like a GUID will only collide with one of the existing four billion integers in the set in less than one out of every 64 billion billion billion cases.
If integers are limited to 32 bits then one can generate a number that is likely to be unique in a single pass using much less than 10 MB. The odds that a pseudo-random 32-bit integer will collide with one of the 4 billion existing integers is about 93% (4e9 / 2^32). The odds that 1000 pseudo-random integers will all collide is less than one in 12,000 billion billion billion (odds-of-one-collision ^ 1000). So if a program maintains a data structure containing 1000 pseudo-random candidates and iterates through the known integers, eliminating matches from the candidates, it is all but certain to find at least one integer that is not in the file.

A detailed discussion on this problem has been discussed in Jon Bentley "Column 1. Cracking the Oyster" Programming Pearls Addison-Wesley pp.3-10
Bentley discusses several approaches, including external sort, Merge Sort using several external files etc., But the best method Bentley suggests is a single pass algorithm using bit fields, which he humorously calls "Wonder Sort" :)
Coming to the problem, 4 billion numbers can be represented in :
4 billion bits = (4000000000 / 8) bytes = about 0.466 GB
The code to implement the bitset is simple: (taken from solutions page )
#define BITSPERWORD 32
#define SHIFT 5
#define MASK 0x1F
#define N 10000000
int a[1 + N/BITSPERWORD];
void set(int i) { a[i>>SHIFT] |= (1<<(i & MASK)); }
void clr(int i) { a[i>>SHIFT] &= ~(1<<(i & MASK)); }
int test(int i){ return a[i>>SHIFT] & (1<<(i & MASK)); }
Bentley's algorithm makes a single pass over the file, setting the appropriate bit in the array and then examines this array using test macro above to find the missing number.
If the available memory is less than 0.466 GB, Bentley suggests a k-pass algorithm, which divides the input into ranges depending on available memory. To take a very simple example, if only 1 byte (i.e memory to handle 8 numbers ) was available and the range was from 0 to 31, we divide this into ranges of 0 to 7, 8-15, 16-22 and so on and handle this range in each of 32/8 = 4 passes.
HTH.

Since the problem does not specify that we have to find the smallest possible number that is not in the file we could just generate a number that is longer than the input file itself. :)

For the 1 GB RAM variant you can use a bit vector. You need to allocate 4 billion bits == 500 MB byte array. For each number you read from the input, set the corresponding bit to '1'. Once you done, iterate over the bits, find the first one that is still '0'. Its index is the answer.

If they are 32-bit integers (likely from the choice of ~4 billion numbers close to 232), your list of 4 billion numbers will take up at most 93% of the possible integers (4 * 109 / (232) ). So if you create a bit-array of 232 bits with each bit initialized to zero (which will take up 229 bytes ~ 500 MB of RAM; remember a byte = 23 bits = 8 bits), read through your integer list and for each int set the corresponding bit-array element from 0 to 1; and then read through your bit-array and return the first bit that's still 0.
In the case where you have less RAM (~10 MB), this solution needs to be slightly modified. 10 MB ~ 83886080 bits is still enough to do a bit-array for all numbers between 0 and 83886079. So you could read through your list of ints; and only record #s that are between 0 and 83886079 in your bit array. If the numbers are randomly distributed; with overwhelming probability (it differs by 100% by about 10-2592069) you will find a missing int). In fact, if you only choose numbers 1 to 2048 (with only 256 bytes of RAM) you'd still find a missing number an overwhelming percentage (99.99999999999999999999999999999999999999999999999999999999999995%) of the time.
But let's say instead of having about 4 billion numbers; you had something like 232 - 1 numbers and less than 10 MB of RAM; so any small range of ints only has a small possibility of not containing the number.
If you were guaranteed that each int in the list was unique, you could sum the numbers and subtract the sum with one # missing to the full sum (½)(232)(232 - 1) = 9223372034707292160 to find the missing int. However, if an int occurred twice this method will fail.
However, you can always divide and conquer. A naive method, would be to read through the array and count the number of numbers that are in the first half (0 to 231-1) and second half (231, 232). Then pick the range with fewer numbers and repeat dividing that range in half. (Say if there were two less number in (231, 232) then your next search would count the numbers in the range (231, 3*230-1), (3*230, 232). Keep repeating until you find a range with zero numbers and you have your answer. Should take O(lg N) ~ 32 reads through the array.
That method was inefficient. We are only using two integers in each step (or about 8 bytes of RAM with a 4 byte (32-bit) integer). A better method would be to divide into sqrt(232) = 216 = 65536 bins, each with 65536 numbers in a bin. Each bin requires 4 bytes to store its count, so you need 218 bytes = 256 kB. So bin 0 is (0 to 65535=216-1), bin 1 is (216=65536 to 2*216-1=131071), bin 2 is (2*216=131072 to 3*216-1=196607). In python you'd have something like:
import numpy as np
nums_in_bin = np.zeros(65536, dtype=np.uint32)
for N in four_billion_int_array:
nums_in_bin[N // 65536] += 1
for bin_num, bin_count in enumerate(nums_in_bin):
if bin_count < 65536:
break # we have found an incomplete bin with missing ints (bin_num)
Read through the ~4 billion integer list; and count how many ints fall in each of the 216 bins and find an incomplete_bin that doesn't have all 65536 numbers. Then you read through the 4 billion integer list again; but this time only notice when integers are in that range; flipping a bit when you find them.
del nums_in_bin # allow gc to free old 256kB array
from bitarray import bitarray
my_bit_array = bitarray(65536) # 32 kB
my_bit_array.setall(0)
for N in four_billion_int_array:
if N // 65536 == bin_num:
my_bit_array[N % 65536] = 1
for i, bit in enumerate(my_bit_array):
if not bit:
print bin_num*65536 + i
break

Why make it so complicated? You ask for an integer not present in the file?
According to the rules specified, the only thing you need to store is the largest integer that you encountered so far in the file. Once the entire file has been read, return a number 1 greater than that.
There is no risk of hitting maxint or anything, because according to the rules, there is no restriction to the size of the integer or the number returned by the algorithm.

This can be solved in very little space using a variant of binary search.
Start off with the allowed range of numbers, 0 to 4294967295.
Calculate the midpoint.
Loop through the file, counting how many numbers were equal, less than or higher than the midpoint value.
If no numbers were equal, you're done. The midpoint number is the answer.
Otherwise, choose the range that had the fewest numbers and repeat from step 2 with this new range.
This will require up to 32 linear scans through the file, but it will only use a few bytes of memory for storing the range and the counts.
This is essentially the same as Henning's solution, except it uses two bins instead of 16k.

EDIT Ok, this wasn't quite thought through as it assumes the integers in the file follow some static distribution. Apparently they don't need to, but even then one should try this:
There are ≈4.3 billion 32-bit integers. We don't know how they are distributed in the file, but the worst case is the one with the highest Shannon entropy: an equal distribution. In this case, the probablity for any one integer to not occur in the file is
( (2³²-1)/2³² )⁴ ⁰⁰⁰ ⁰⁰⁰ ⁰⁰⁰ ≈ .4
The lower the Shannon entropy, the higher this probability gets on the average, but even for this worst case we have a chance of 90% to find a nonoccurring number after 5 guesses with random integers. Just create such numbers with a pseudorandom generator, store them in a list. Then read int after int and compare it to all of your guesses. When there's a match, remove this list entry. After having been through all of the file, chances are you will have more than one guess left. Use any of them. In the rare (10% even at worst case) event of no guess remaining, get a new set of random integers, perhaps more this time (10->99%).
Memory consumption: a few dozen bytes, complexity: O(n), overhead: neclectable as most of the time will be spent in the unavoidable hard disk accesses rather than comparing ints anyway.
The actual worst case, when we do not assume a static distribution, is that every integer occurs max. once, because then only
1 - 4000000000/2³² ≈ 6%
of all integers don't occur in the file. So you'll need some more guesses, but that still won't cost hurtful amounts of memory.

If you have one integer missing from the range [0, 2^x - 1] then just xor them all together. For example:
>>> 0 ^ 1 ^ 3
2
>>> 0 ^ 1 ^ 2 ^ 3 ^ 4 ^ 6 ^ 7
5
(I know this doesn't answer the question exactly, but it's a good answer to a very similar question.)

They may be looking to see if you have heard of a probabilistic Bloom Filter which can very efficiently determine absolutely if a value is not part of a large set, (but can only determine with high probability it is a member of the set.)

Based on the current wording in the original question, the simplest solution is:
Find the maximum value in the file, then add 1 to it.

Use a BitSet. 4 billion integers (assuming up to 2^32 integers) packed into a BitSet at 8 per byte is 2^32 / 2^3 = 2^29 = approx 0.5 Gb.
To add a bit more detail - every time you read a number, set the corresponding bit in the BitSet. Then, do a pass over the BitSet to find the first number that's not present. In fact, you could do this just as effectively by repeatedly picking a random number and testing if it's present.
Actually BitSet.nextClearBit(0) will tell you the first non-set bit.
Looking at the BitSet API, it appears to only support 0..MAX_INT, so you may need 2 BitSets - one for +'ve numbers and one for -'ve numbers - but the memory requirements don't change.

If there is no size limit, the quickest way is to take the length of the file, and generate the length of the file+1 number of random digits (or just "11111..." s). Advantage: you don't even need to read the file, and you can minimize memory use nearly to zero. Disadvantage: You will print billions of digits.
However, if the only factor was minimizing memory usage, and nothing else is important, this would be the optimal solution. It might even get you a "worst abuse of the rules" award.

If we assume that the range of numbers will always be 2^n (an even power of 2), then exclusive-or will work (as shown by another poster). As far as why, let's prove it:
The Theory
Given any 0 based range of integers that has 2^n elements with one element missing, you can find that missing element by simply xor-ing the known values together to yield the missing number.
The Proof
Let's look at n = 2. For n=2, we can represent 4 unique integers: 0, 1, 2, 3. They have a bit pattern of:
0 - 00
1 - 01
2 - 10
3 - 11
Now, if we look, each and every bit is set exactly twice. Therefore, since it is set an even number of times, and exclusive-or of the numbers will yield 0. If a single number is missing, the exclusive-or will yield a number that when exclusive-ored with the missing number will result in 0. Therefore, the missing number, and the resulting exclusive-ored number are exactly the same. If we remove 2, the resulting xor will be 10 (or 2).
Now, let's look at n+1. Let's call the number of times each bit is set in n, x and the number of times each bit is set in n+1 y. The value of y will be equal to y = x * 2 because there are x elements with the n+1 bit set to 0, and x elements with the n+1 bit set to 1. And since 2x will always be even, n+1 will always have each bit set an even number of times.
Therefore, since n=2 works, and n+1 works, the xor method will work for all values of n>=2.
The Algorithm For 0 Based Ranges
This is quite simple. It uses 2*n bits of memory, so for any range <= 32, 2 32 bit integers will work (ignoring any memory consumed by the file descriptor). And it makes a single pass of the file.
long supplied = 0;
long result = 0;
while (supplied = read_int_from_file()) {
result = result ^ supplied;
}
return result;
The Algorithm For Arbitrary Based Ranges
This algorithm will work for ranges of any starting number to any ending number, as long as the total range is equal to 2^n... This basically re-bases the range to have the minimum at 0. But it does require 2 passes through the file (the first to grab the minimum, the second to compute the missing int).
long supplied = 0;
long result = 0;
long offset = INT_MAX;
while (supplied = read_int_from_file()) {
if (supplied < offset) {
offset = supplied;
}
}
reset_file_pointer();
while (supplied = read_int_from_file()) {
result = result ^ (supplied - offset);
}
return result + offset;
Arbitrary Ranges
We can apply this modified method to a set of arbitrary ranges, since all ranges will cross a power of 2^n at least once. This works only if there is a single missing bit. It takes 2 passes of an unsorted file, but it will find the single missing number every time:
long supplied = 0;
long result = 0;
long offset = INT_MAX;
long n = 0;
double temp;
while (supplied = read_int_from_file()) {
if (supplied < offset) {
offset = supplied;
}
}
reset_file_pointer();
while (supplied = read_int_from_file()) {
n++;
result = result ^ (supplied - offset);
}
// We need to increment n one value so that we take care of the missing
// int value
n++
while (n == 1 || 0 != (n & (n - 1))) {
result = result ^ (n++);
}
return result + offset;
Basically, re-bases the range around 0. Then, it counts the number of unsorted values to append as it computes the exclusive-or. Then, it adds 1 to the count of unsorted values to take care of the missing value (count the missing one). Then, keep xoring the n value, incremented by 1 each time until n is a power of 2. The result is then re-based back to the original base. Done.
Here's the algorithm I tested in PHP (using an array instead of a file, but same concept):
function find($array) {
$offset = min($array);
$n = 0;
$result = 0;
foreach ($array as $value) {
$result = $result ^ ($value - $offset);
$n++;
}
$n++; // This takes care of the missing value
while ($n == 1 || 0 != ($n & ($n - 1))) {
$result = $result ^ ($n++);
}
return $result + $offset;
}
Fed in an array with any range of values (I tested including negatives) with one inside that range which is missing, it found the correct value each time.
Another Approach
Since we can use external sorting, why not just check for a gap? If we assume the file is sorted prior to the running of this algorithm:
long supplied = 0;
long last = read_int_from_file();
while (supplied = read_int_from_file()) {
if (supplied != last + 1) {
return last + 1;
}
last = supplied;
}
// The range is contiguous, so what do we do here? Let's return last + 1:
return last + 1;

Trick question, unless it's been quoted improperly. Just read through the file once to get the maximum integer n, and return n+1.
Of course you'd need a backup plan in case n+1 causes an integer overflow.

Check the size of the input file, then output any number which is too large to be represented by a file that size. This may seem like a cheap trick, but it's a creative solution to an interview problem, it neatly sidesteps the memory issue, and it's technically O(n).
void maxNum(ulong filesize)
{
ulong bitcount = filesize * 8; //number of bits in file
for (ulong i = 0; i < bitcount; i++)
{
Console.Write(9);
}
}
Should print 10 bitcount - 1, which will always be greater than 2 bitcount. Technically, the number you have to beat is 2 bitcount - (4 * 109 - 1), since you know there are (4 billion - 1) other integers in the file, and even with perfect compression they'll take up at least one bit each.

The simplest approach is to find the minimum number in the file, and return 1 less than that. This uses O(1) storage, and O(n) time for a file of n numbers. However, it will fail if number range is limited, which could make min-1 not-a-number.
The simple and straightforward method of using a bitmap has already been mentioned. That method uses O(n) time and storage.
A 2-pass method with 2^16 counting-buckets has also been mentioned. It reads 2*n integers, so uses O(n) time and O(1) storage, but it cannot handle datasets with more than 2^16 numbers. However, it's easily extended to (eg) 2^60 64-bit integers by running 4 passes instead of 2, and easily adapted to using tiny memory by using only as many bins as fit in memory and increasing the number of passes correspondingly, in which case run time is no longer O(n) but instead is O(n*log n).
The method of XOR'ing all the numbers together, mentioned so far by rfrankel and at length by ircmaxell answers the question asked in stackoverflow#35185, as ltn100 pointed out. It uses O(1) storage and O(n) run time. If for the moment we assume 32-bit integers, XOR has a 7% probability of producing a distinct number. Rationale: given ~ 4G distinct numbers XOR'd together, and ca. 300M not in file, the number of set bits in each bit position has equal chance of being odd or even. Thus, 2^32 numbers have equal likelihood of arising as the XOR result, of which 93% are already in file. Note that if the numbers in file aren't all distinct, the XOR method's probability of success rises.

Strip the white space and non numeric characters from the file and append 1. Your file now contains a single number not listed in the original file.
From Reddit by Carbonetc.

For some reason, as soon as I read this problem I thought of diagonalization. I'm assuming arbitrarily large integers.
Read the first number. Left-pad it with zero bits until you have 4 billion bits. If the first (high-order) bit is 0, output 1; else output 0. (You don't really have to left-pad: you just output a 1 if there are not enough bits in the number.) Do the same with the second number, except use its second bit. Continue through the file in this way. You will output a 4-billion bit number one bit at a time, and that number will not be the same as any in the file. Proof: it were the same as the nth number, then they would agree on the nth bit, but they don't by construction.

You can use bit flags to mark whether an integer is present or not.
After traversing the entire file, scan each bit to determine if the number exists or not.
Assuming each integer is 32 bit, they will conveniently fit in 1 GB of RAM if bit flagging is done.

Just for the sake of completeness, here is another very simple solution, which will most likely take a very long time to run, but uses very little memory.
Let all possible integers be the range from int_min to int_max, and
bool isNotInFile(integer) a function which returns true if the file does not contain a certain integer and false else (by comparing that certain integer with each integer in the file)
for (integer i = int_min; i <= int_max; ++i)
{
if (isNotInFile(i)) {
return i;
}
}

For the 10 MB memory constraint:
Convert the number to its binary representation.
Create a binary tree where left = 0 and right = 1.
Insert each number in the tree using its binary representation.
If a number has already been inserted, the leafs will already have been created.
When finished, just take a path that has not been created before to create the requested number.
4 billion number = 2^32, meaning 10 MB might not be sufficient.
EDIT
An optimization is possible, if two ends leafs have been created and have a common parent, then they can be removed and the parent flagged as not a solution. This cuts branches and reduces the need for memory.
EDIT II
There is no need to build the tree completely too. You only need to build deep branches if numbers are similar. If we cut branches too, then this solution might work in fact.

I will answer the 1 GB version:
There is not enough information in the question, so I will state some assumptions first:
The integer is 32 bits with range -2,147,483,648 to 2,147,483,647.
Pseudo-code:
var bitArray = new bit[4294967296]; // 0.5 GB, initialized to all 0s.
foreach (var number in file) {
bitArray[number + 2147483648] = 1; // Shift all numbers so they start at 0.
}
for (var i = 0; i < 4294967296; i++) {
if (bitArray[i] == 0) {
return i - 2147483648;
}
}

As long as we're doing creative answers, here is another one.
Use the external sort program to sort the input file numerically. This will work for any amount of memory you may have (it will use file storage if needed).
Read through the sorted file and output the first number that is missing.

Bit Elimination
One way is to eliminate bits, however this might not actually yield a result (chances are it won't). Psuedocode:
long val = 0xFFFFFFFFFFFFFFFF; // (all bits set)
foreach long fileVal in file
{
val = val & ~fileVal;
if (val == 0) error;
}
Bit Counts
Keep track of the bit counts; and use the bits with the least amounts to generate a value. Again this has no guarantee of generating a correct value.
Range Logic
Keep track of a list ordered ranges (ordered by start). A range is defined by the structure:
struct Range
{
long Start, End; // Inclusive.
}
Range startRange = new Range { Start = 0x0, End = 0xFFFFFFFFFFFFFFFF };
Go through each value in the file and try and remove it from the current range. This method has no memory guarantees, but it should do pretty well.

2128*1018 + 1 ( which is (28)16*1018 + 1 ) - cannot it be a universal answer for today? This represents a number that cannot be held in 16 EB file, which is the maximum file size in any current file system.

I think this is a solved problem (see above), but there's an interesting side case to keep in mind because it might get asked:
If there are exactly 4,294,967,295 (2^32 - 1) 32-bit integers with no repeats, and therefore only one is missing, there is a simple solution.
Start a running total at zero, and for each integer in the file, add that integer with 32-bit overflow (effectively, runningTotal = (runningTotal + nextInteger) % 4294967296). Once complete, add 4294967296/2 to the running total, again with 32-bit overflow. Subtract this from 4294967296, and the result is the missing integer.
The "only one missing integer" problem is solvable with only one run, and only 64 bits of RAM dedicated to the data (32 for the running total, 32 to read in the next integer).
Corollary: The more general specification is extremely simple to match if we aren't concerned with how many bits the integer result must have. We just generate a big enough integer that it cannot be contained in the file we're given. Again, this takes up absolutely minimal RAM. See the pseudocode.
# Grab the file size
fseek(fp, 0L, SEEK_END);
sz = ftell(fp);
# Print a '2' for every bit of the file.
for (c=0; c<sz; c++) {
for (b=0; b<4; b++) {
print "2";
}
}

As Ryan said it basically, sort the file and then go over the integers and when a value is skipped there you have it :)
EDIT at downvoters: the OP mentioned that the file could be sorted so this is a valid method.

If you don't assume the 32-bit constraint, just return a randomly generated 64-bit number (or 128-bit if you're a pessimist). The chance of collision is 1 in 2^64/(4*10^9) = 4611686018.4 (roughly 1 in 4 billion). You'd be right most of the time!
(Joking... kind of.)

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio