Get 10000+ unique random numbers (performance) [duplicate] - performance

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Create Random Number Sequence with No Repeats
I'd like to write an URL shortener that only uses numbers as short string.
I don't want to count up, I want the next new number to be random (or pseudo random).
At first thought algorithm would then look like this (pseudo code):
do
{
number = random(0,10000)
}
while (datastore.contains(number))
datastore.store(number, url)
The problem with this implementation is: As the datastore contains more numbers, the more likely it is that the loop will be executed multiple times. The performance will decrease over time.
Isn't there a better way to get a random number that is not already in use?

1) fill an array with sequential values
2) shuffle the array

Use an encryption. Since encryption is reversible, unique inputs generate unique outputs. For 64 bit numbers use a cypher with a 64 bit blocksize. For smaller block sizes, such as 32 bit or 16 bit, have a look at the Hasty Pudding Cypher.
Whatever block size you need, just encrypt the numbers 0, 1, 2, ... (in the appropriate block size) to generate as many unique non-sequential numbers as you need.

Some related questions: # 2394246, # 54059, # 158716, # 196017, and # 1608181.
The proper approach depends on how many numbers you will generate and on if realtime performance is required. If you draw no more than a small fraction of the numbers available in a range, average time per number for your code snippet is O(1), with slight increase of time per later number but still O(1). See, for example, question #1608181 answer in which I show that getting k numbers from a range of more than 2*k numbers with such code is O(k). (That answer also has C code to generate M numbers from a range of N numbers, in O(M) time when M<N/2, and explains how to use it for O(M) time when M>=N/2.)
If you want O(1) performance with a hard time limit, you can use the program just mentioned to pre-load an array, or can shuffle the whole range of integers, as mentioned by Justin. After that preprocessing, each access is O(1). Buf if you know you won't draw more than say 3000 numbers from your 1...10000 range, and don't have a hard time limit, the code you have will run in O(1) time on average, with probability of k passes decreasing like 0.3 ^ k; i.e., at worst about 70% chance of 1 pass, 21% for 2, 6% for 3, 2% for 4, 0.6% for 5, and so forth.

Related

Shuffle sequential numbers without a buffer

I am looking for a shuffle algorithm to shuffle a set of sequential numbers without buffering. Another way to state this is that I’m looking for a random sequence of unique numbers that have a given period.
Your typical Fisher–Yates shuffle needs to have each element all of the elements it is going to shuffle, so that isn’t going to work.
A Linear-Feedback Shift Register (LFSR) does what I want, but only works for periods that are powers-of-two less two. Here is an example of using a 4-bit LFSR to shuffle the numbers 1-14:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
8
12
14
7
4
10
5
11
6
3
2
1
9
13
The first two is the input, and the second row the output. What’s nice is that the state is very small—just the current index. You can start of any index and get a difference set of numbers (starting at 1 yields: 8, 12, 14; starting at 9: 6, 3, 2), although the sequence is always the same (5 is always followed by 11). If I want a different sequence, I can pick a different generator polynomial.
The limitations to the LFSR are that the periods are always power-of-two less two (the min and max are always the same, thus unshuffled) and there not enough enough generator polynomials to allow every possible random sequence.
A block cipher algorithm would work. Every key produces a uniquely shuffled set of numbers. However all block ciphers (that I know about) have power-of-two block sizes, and usually a fixed or limited number of block sizes. A block cipher with a arbitrary non-binary block size would be perfect if such a thing exists.
There are a couple of projects I have that could benefit from such an algorithm. One is for small embedded micros that need to produce a shuffled sequence of numbers with a period larger than the memory they have available (think Arduino Uno needing to shuffle 1 to 100,000).
Does such an algorithm exist? If not, what things might I search for to help me develop such an algorithm? Or is this simply not possible?
Edit 2022-01-30
I have received a lot of good feedback and I need to better explain what I am searching for.
In addition to the Arduino example, where memory is an issue, there is also the shuffle of a large number of records (billions to trillions). The desire is to have a shuffle applied to these records without needing a buffer to hold the shuffle order array, or the time needed to build that array.
I do not need an algorithm that could produce every possible permutation, but a large number of permutations. Something like a typical block cipher in counter mode where each key produces a unique sequence of values.
A Linear Congruential Generator using coefficients to produce the desired sequence period will only produce a single sequence. This is the same problem for a Linear Feedback Shift Register.
Format-Preserving Encryption (FPE), such as AES FFX, shows promise and is where I am currently focusing my attention. Additional feedback welcome.
It is certainly not possible to produce an algorithm which could potentially generate every possible sequence of length N with less than N (log2N - 1.45) bits of state, because there are N! possible sequence and each state can generate exactly one sequence. If your hypothetical Arduino application could produce every possible sequence of 100,000 numbers, it would require at least 1,516,705 bits of state, a bit more than 185Kib, which is probably more memory than you want to devote to the problem [Note 1].
That's also a lot more memory than you would need for the shuffle buffer; that's because the PRNG driving the shuffle algorithm also doesn't have enough state to come close to being able to generate every possible sequence. It can't generate more different sequences than the number of different possible states that it has.
So you have to make some compromise :-)
One simple algorithm is to start with some parametrisable generator which can produce non-repeating sequences for a large variety of block sizes. Then you just choose a block size which is as least as large as your target range but not "too much larger"; say, less than twice as large. Then you just select a subrange of the block size and start generating numbers. If the generated number is inside the subrange, you return its offset; if not, you throw it away and generate another number. If the generator's range is less than twice the desired range, then you will throw away less than half of the generated values and producing the next element in the sequence will be amortised O(1). In theory, it might take a long time to generate an individual value, but that's not very likely, and if you use a not-very-good PRNG like a linear congruential generator, you can make it very unlikely indeed by restricting the possible generator parameters.
For LCGs you have a couple of possibilities. You could use a power-of-two modulus, with an odd offset and a multiplier which is 5 mod 8 (and not too far from the square root of the block size), or you could use a prime modulus with almost arbitrary offset and multiplier. Using a prime modulus is computationally more expensive but the deficiencies of LCG are less apparent. Since you don't need to handle arbitrary primes, you can preselect a geometrically-spaced sample and compute the efficient division-by-multiplication algorithm for each one.
Since you're free to use any subrange of the generator's range, you have an additional potential parameter: the offset of the start of the subrange. (Or even offsets, since the subrange doesn't need to be contiguous.) You can also increase the apparent randomness by doing any bijective transformation (XOR/rotates are good, if you're using a power-of-two block size.)
Depending on your application, there are known algorithms to produce block ciphers for subword bit lengths [Note 2], which gives you another possible way to increase randomness and/or add some more bits to the generator state.
Notes
The approximation for the minimum number of states comes directly from Stirling's approximation for N!, but I computed the number of bits by using the commonly available lgamma function.
With about 30 seconds of googling, I found this paper on researchgate.net; I'm far from knowledgable enough in crypto to offer an opinion, but it looks credible; also, there are references to other algorithms in its footnotes.

How to compress an array of random positive integers in a certain range?

I want to compress an array consisting of about 10^5 random integers in range 0 to 2^15. The integers are unsorted and I need to compress them lossless.
I don't care much about the amount of computation and time needed to run the algorithm, just want to have better compression ratio.
Are there any suggested algorithms for this?
Assuming you don´t need to preserve original order, instead of passing the numbers themselves, pass the count. If they have a normal distribution, you can expect each number to be repeated 3 or 4 times. With 3 bits per number, we can count up to 7. You can make an array of 2^15 * 3 bits and every 3 bits set the count of that number. To handle extreme cases that have more than 7, we can also send a list of numbers and their counts for these cases. Then you can read the 3 bits array and overwrite with the additional info for count higher than 7.
For your exact example: just encode each number as a 15-bit unsigned int and apply bit packing. This is optimal since you have stated each integer in uniformly random in [0, 2^15), and the Shannon Entropy of this distribution is 15 bits.
For a more general solution, apply Quantile Compression (https://github.com/mwlon/quantile-compression/). It takes advantage of any smooth-ish data and compresses near optimally on shuffled data. It works by encoding each integer with a Huffman code for it coarse range in the distribution, then an exact offset within that range.
These approaches are both computationally cheap, but more compute won't get you further in this case.

How can I optimize sieve of eratosthenes to just store prime numbers for a very large range?

I have studied the working of Sieve of Eratosthenes for generating the prime numbers up to a given number using iteration and striking off all the composite numbers. And the algorithm needs just to be iterated up to sqrt(n) where n is the upper bound upto which we need to find all the prime numbers. We know that the number of prime numbers up to n=10^9 is very less as compared to the number of composite numbers. So we use all the space to just tell that these numbers are not prime first by marking them composite.
My question is can we modify the algorithm to just store prime numbers since we deal with a very large range (since number of primes are very less)?
Can we just store straight away the prime numbers?
Changing the structure from that of a set (sieve) - one bit per candidate - to storing primes (e.g. in a list, vector or tree structure) actually increases storage requirements.
Example: there are 203.280.221 primes below 2^32. An array of uint32_t of that size requires about 775 MiB whereas the corresponding bitmap (a.k.a. set representation) occupies only 512 MiB (2^32 bits / 8 bits/byte = 2^29 bytes).
The most compact number-based representation with fixed cell size would be storing the halved distance between consecutive odd primes, since up to about 2^40 the halved distance fits into a byte. At 193 MiB for the primes up to 2^32 this is slightly smaller than an odds-only bitmap but it is only efficient for sequential processing. For sieving it is not suitable because, as Anatolijs has pointed out, algorithms like the Sieve of Eratosthenes effectively require a set representation.
The bitmap can be shrunk drastically by leaving out the multiples of small primes. Most famous is the odds-only representation that leaves out the number 2 and its multiples; this halves the space requirement to 256 MiB at virtually no cost in added code complexity. You just need to remember to pull the number 2 out of thin air when needed, since it isn't represented in the sieve.
Even more space can be saved by leaving out multiples of more small primes; this generalisation of the 'odds-only' trick is usually called 'wheeled storage' (see Wheel Factorization in the Wikipedia). However, the gain from adding more small primes to the wheel gets smaller and smaller whereas the wheel modulus ('circumference') increases explosively. Adding 3 removes 1/3rd of the remaining numbers, adding 5 removes a further 1/5th, adding 7 only gets you a further 1/7th and so on.
Here's an overview of what adding another prime to the wheel can get you. 'ratio' is the size of the wheeled/reduced set relative to the full set that represents every number; 'delta' gives the shrinkage compared to the previous step. 'spokes' refers to the number of prime-bearing spokes which need to be represented/stored; the total number of spokes for a wheel is of course equal to its modulus (circumference).
The mod 30 wheel (about 136 MiB for the primes up to 2^32) offers an excellent cost/benefit ratio because it has eight prime-bearing spokes, which means that there is a one-to-one correspondence between wheels and 8-bit bytes. This enables many efficient implementation tricks. However, its cost in added code complexity is considerable despite this fortuitous circumstance, and for many purposes the odds-only sieve ('mod 2 wheel') gives the most bang for buck by far.
There are two additional considerations worth keeping in mind. The first is that data sizes like these often exceed the capacity of memory caches by a wide margin, so that programs can often spend a lot of time waiting for the memory system to deliver the data. This is compounded by the typical access patterns of sieving - striding over the whole range, again and again and again. Speedups of several orders of magnitude are possible by working the data in small batches that fit into the level-1 data cache of the processor (typically 32 KiB); lesser speedups are still possible by keeping within the capacity of the L2 and L3 caches (a few hundred KiB and a few MiB, respectively). The keyword here is 'segmented sieving'.
The second consideration is that many sieving tasks - like the famous SPOJ PRIME1 and its updated version PRINT (with extended bounds and tightened time limit) - require only the small factor primes up to the square root of the upper limit to be permanently available for direct access. That's a comparatively small number: 3512 when sieving up to 2^31 as in the case of PRINT.
Since these primes have already been sieved there's no need for a set representation anymore, and since they are few there are no problems with storage space. This means they are most profitably kept as actual numbers in a vector or list for easy iteration, perhaps with additional, auxiliary data like current working offset and phase. The actual sieving task is then easily accomplished via a technique called 'windowed sieving'. In the case of PRIME1 and PRINT this can be several orders of magnitude faster than sieving the whole range up to the upper limit, since both tasks only asks for a small number of subranges to be sieved.
You can do that (remove numbers that are detected to be non-prime from your array/linked list), but then time complexity of algorithm will degrade to O(N^2/log(N)) or something like that, instead of original O(N*log(N)). This is because you will not be able to say "the numbers 2X, 3X, 4X, ..." are not prime anymore. You will have to loop through your entire compressed list.
You could erase each composite number from the array/vector once you have shown it to be composite. Or when you fill an array of numbers to put through the sieve, remove all even numbers (other than 2) and all numbers ending in 5.
If you have studied the sieve right, you must know we don't have the primes to begin with. We have an array, sizeof which is equal to the range. Now, if you want the range to be 10e9, you want this to be the size of the array. You have mentioned no language, but for each number, you must need a bit to represent whether it is prime or not.
Even that means you need 10^9 bits = 1.125 * 10^8 bytes which is greater than 100 MB of RAM.
Assuming you have all this, most optimized sieve takes O(n * log(log n)) time, which is, if n = 10e9, on a machine that evaluates 10e8 instructions per second, will still take some minutes.
Now, assuming you have all this with you, still, number of primes till 10e9 is q = 50,847,534, to save these will still take q * 4 bytes, which is still greater than 100MB. (more RAM)
Even if you remove the indexes which are multiples of 2, 3 or 5, this removes 21 numbers in every 30. This is not good enough, because in total, you will still need around 140 MB space. (40MB = a third of 10^9 bits + ~100MB for storing prime numbers).
So, since, for storing the primes, you will, in any case require similar amount of memory (of the same order as calculation), your question, IMO has no solution.
You can halve the size of the sieve by only 'storing' the odd numbers. This requires code to explicitly deal with the case of testing even numbers. For odd numbers, bit b of the sieve represents n = 2b + 3. Hence bit 0 represents 3, bit 1 represents 5 and so on. There is a small overhead in converting between the number n and the bit index b.
Whether this technique is any use to you depends on the memory/speed balance you require.

using 10 MB of memory for four billion integers (about finding the optimized block size) [duplicate]

This question already has answers here:
Generate an integer that is not among four billion given ones
(38 answers)
Closed 7 years ago.
The problem is, given an input file with four billion integers, provide an algorithm to generate an integer which is not contained in the file, assume only have 10 MB of memory.
Searched for some solutions, one of which is to store integers into bit-vector blocks (each block representing a specific range of integers among 4 billion range, each bit in the block represent for an integer), and using another counter for each block, to count the number of integers in each block. So that if number of integers is less than the block capacity for integers, scan the bit-vector of the block to find which are missing integers.
My confusion for this solution is, it is mentioned the optimal smallest footprint is, when the array of block counters occupies the same memory as the bit vector. I am confused why in such situation it is the optimal smallest footprint?
Here are calculation details I referred,
Let N = 2^32.
counters (bytes): blocks * 4
bit vector (bytes): (N / blocks) / 8
blocks * 4 = (N / blocks) / 8
blocks^2 = N / 32
blocks = sqrt(N/2)/4
thanks in advance,
Lin
Why it is the smallest memory footprint:
In the solution you proposed, there are two phases:
Count number of integers in each block
This uses 4*(#blocks) bytes of memory.
Use a bit-vector each bit representing an integer in the block.
This uses (blocksize/8) bytes of memory, which is (N/blocks)/8.
Setting the 2 to be equal results in blocks = sqrt(N/32) as you have mentioned.
This is the optimal because the memory required is the maximum of the memory required in each phase (which must both be executed). After the 1st phase, you can forget the counters, except for which block to search in for phase 2.
Optimization
If your counter saturates when it reaches capacity, you don't really need 4 bytes per counter, but rather 3 bytes. A counter reaches capacity when it exceeds the number of integers in the block.
In this case, phase 1 uses 3*blocks of memory, and phase 2 uses (N/blocks)/8. Therefore, the optimal is blocks = sqrt(N/24). If N is 4 billion, the number of blocks is approx 12910, and the block size is 309838 integers per block. This fits in 3 bytes.
Caveats, and alternative with good average case performance
The algorithm you proposed only works if all input integers are distinct. In case the integers are not distinct, I suggest you simply go with a randomized candidate set of integers approach. In a randomized candidate set of integers approach, you can select say 1000 candidate integers at random, and check if any are not found in the input file. If you fail, you can try find another random set of candidate integers. While this has poor worst case performance, it would be faster in the average case for most input. For example, if the input integers have a coverage of 99% of possible integers, then on average, with 1000 candidate integers, 10 of them will not be found. You can select the candidate integers pseudo-randomly so that you never repeat a candidate integer, and also to guarantee that in a fixed number of tries, you will have tested all possible integers.
If each time, you check sqrt(N) candidate integers, then the worst case performance can be as good as N*sqrt(N), because you might have to scan all N integers sqrt(N) times.
You can avoid the worst case time if you use this alternative, and if it doesn't work for the first set of candidate integers, you switch to your proposed solution. This might give better average case performance (this is a common strategy in sorting where quicksort is used first, before switching to heapsort for example if it appears that the worst case input is present).
# assumes all integers are positive and fit into an int
# very slow, but definitely uses less than 10MB RAM
int generate_unique_integer(file input-file)
{
int largest=0
while (not eof(input-file))
{
i=read(integer)
if (i>largest) largest=i
}
return i++; #larger than the largest integer in the input file
}

non-repeating random numbers

I need to generate around 9-100 million non-repeating random numbers, ranging from zero to the amount of numbers generated, and I need them to be generated very quickly. Several answers to similar questions proposed simply shuffling an array in order to get the random numbers, and others proposed using a bloom filter. The question is, which one is more efficient, and in case of it being the bloom filter, how do I use it?
You don't want random numbers at all. You want exactly the numbers 0 to N-1, in random order.
Simply filling the array and shuffling should be very quick. A proper Fisher-Yates shuffle is O(n), so an array of 100 million should take well under a second in C or even Java, slightly slower in a higher-level language like Python.
You only have to generate N-1 random numbers to do the shuffle (maybe up to 1.3N if you use rejection sampling to get perfect uniformity), so the speed will depend largely on how fast your RNG is.
You'll never need to look up whether a number has already be generated; that will deadly be slow no matter which algorithm you use, especially toward the end of the run.
If you need slightly fewer than N total numbers, fill the array from 0 to N-1, then just abort the shuffle early and take the partial result. Only if the amount of numbers you need is very small compared to their range should you consider the generate-and-check-for-dups approach. In that case Bob Floyd's algorithm might be good.
As an alternative you could use an appropriately sized block cypher. Use the block cypher to encrypt the numbers 0, 1, 2, ... and you will get a series of non-repeating random numbers out. Exactly what series will depend on the key you use. They are guaranteed not to repeat, because a block cypher is a reversible permutation.
For 64 bit numbers use DES, for 32 bit use Hasty Pudding (which allows a large range of block sizes) or write your own simple Feistel cypher. Assuming that security is not a big issue for this, then writing your own is possible.
For sure its better create an algorithm to shuffle the numbers, if you use a seed, as for example, the server microtime or timestamp, you can have one different random string for each milisecond .
Start creating an array using range function, set number of numbers as you like .
Than, you need to use a seed to make the pseudo-randomness better .
So, instead of rand, you gotta use SHUFFLE,
so you set array on range as 1 to 90, set the seed, than use shuffle to shuffle the array.. than you got all numbers in a random order (corresponding to the seed) .
You gotta change the seed to have another result .
The order of the numbers is the result .
as .. ball 1 : 42 ... ball 2: 10.... ball 3: 50.... ball 1 is 0 in the array. ;)
You can also use slice function and create a for / each loop, incrementing the slice factor, so you loop
slice array 0,1 the the result .. ball 1...
slice array 0.2 ball 2...
slice array 0.3
Thats the logic, i hope you understand, if so .. it ill help you a lot .

Resources