Predicting next bit in a non random sequence - random

I have a nonrandom sequence of bits - I'm able to tell that the sequence is nonrandom because I have performed on it the runs test, moreover the sequence presents autocorrelation at lag 1 equal to 0.4 (right before the cutoff) whereas the partial auto correlation function has a sinusoidal behaviour... any suggestion on how to exploit those regularities, without assuming that the sequence follows a binomial distribution?

Related

Shuffle sequential numbers without a buffer

I am looking for a shuffle algorithm to shuffle a set of sequential numbers without buffering. Another way to state this is that I’m looking for a random sequence of unique numbers that have a given period.
Your typical Fisher–Yates shuffle needs to have each element all of the elements it is going to shuffle, so that isn’t going to work.
A Linear-Feedback Shift Register (LFSR) does what I want, but only works for periods that are powers-of-two less two. Here is an example of using a 4-bit LFSR to shuffle the numbers 1-14:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
8
12
14
7
4
10
5
11
6
3
2
1
9
13
The first two is the input, and the second row the output. What’s nice is that the state is very small—just the current index. You can start of any index and get a difference set of numbers (starting at 1 yields: 8, 12, 14; starting at 9: 6, 3, 2), although the sequence is always the same (5 is always followed by 11). If I want a different sequence, I can pick a different generator polynomial.
The limitations to the LFSR are that the periods are always power-of-two less two (the min and max are always the same, thus unshuffled) and there not enough enough generator polynomials to allow every possible random sequence.
A block cipher algorithm would work. Every key produces a uniquely shuffled set of numbers. However all block ciphers (that I know about) have power-of-two block sizes, and usually a fixed or limited number of block sizes. A block cipher with a arbitrary non-binary block size would be perfect if such a thing exists.
There are a couple of projects I have that could benefit from such an algorithm. One is for small embedded micros that need to produce a shuffled sequence of numbers with a period larger than the memory they have available (think Arduino Uno needing to shuffle 1 to 100,000).
Does such an algorithm exist? If not, what things might I search for to help me develop such an algorithm? Or is this simply not possible?
Edit 2022-01-30
I have received a lot of good feedback and I need to better explain what I am searching for.
In addition to the Arduino example, where memory is an issue, there is also the shuffle of a large number of records (billions to trillions). The desire is to have a shuffle applied to these records without needing a buffer to hold the shuffle order array, or the time needed to build that array.
I do not need an algorithm that could produce every possible permutation, but a large number of permutations. Something like a typical block cipher in counter mode where each key produces a unique sequence of values.
A Linear Congruential Generator using coefficients to produce the desired sequence period will only produce a single sequence. This is the same problem for a Linear Feedback Shift Register.
Format-Preserving Encryption (FPE), such as AES FFX, shows promise and is where I am currently focusing my attention. Additional feedback welcome.
It is certainly not possible to produce an algorithm which could potentially generate every possible sequence of length N with less than N (log2N - 1.45) bits of state, because there are N! possible sequence and each state can generate exactly one sequence. If your hypothetical Arduino application could produce every possible sequence of 100,000 numbers, it would require at least 1,516,705 bits of state, a bit more than 185Kib, which is probably more memory than you want to devote to the problem [Note 1].
That's also a lot more memory than you would need for the shuffle buffer; that's because the PRNG driving the shuffle algorithm also doesn't have enough state to come close to being able to generate every possible sequence. It can't generate more different sequences than the number of different possible states that it has.
So you have to make some compromise :-)
One simple algorithm is to start with some parametrisable generator which can produce non-repeating sequences for a large variety of block sizes. Then you just choose a block size which is as least as large as your target range but not "too much larger"; say, less than twice as large. Then you just select a subrange of the block size and start generating numbers. If the generated number is inside the subrange, you return its offset; if not, you throw it away and generate another number. If the generator's range is less than twice the desired range, then you will throw away less than half of the generated values and producing the next element in the sequence will be amortised O(1). In theory, it might take a long time to generate an individual value, but that's not very likely, and if you use a not-very-good PRNG like a linear congruential generator, you can make it very unlikely indeed by restricting the possible generator parameters.
For LCGs you have a couple of possibilities. You could use a power-of-two modulus, with an odd offset and a multiplier which is 5 mod 8 (and not too far from the square root of the block size), or you could use a prime modulus with almost arbitrary offset and multiplier. Using a prime modulus is computationally more expensive but the deficiencies of LCG are less apparent. Since you don't need to handle arbitrary primes, you can preselect a geometrically-spaced sample and compute the efficient division-by-multiplication algorithm for each one.
Since you're free to use any subrange of the generator's range, you have an additional potential parameter: the offset of the start of the subrange. (Or even offsets, since the subrange doesn't need to be contiguous.) You can also increase the apparent randomness by doing any bijective transformation (XOR/rotates are good, if you're using a power-of-two block size.)
Depending on your application, there are known algorithms to produce block ciphers for subword bit lengths [Note 2], which gives you another possible way to increase randomness and/or add some more bits to the generator state.
Notes
The approximation for the minimum number of states comes directly from Stirling's approximation for N!, but I computed the number of bits by using the commonly available lgamma function.
With about 30 seconds of googling, I found this paper on researchgate.net; I'm far from knowledgable enough in crypto to offer an opinion, but it looks credible; also, there are references to other algorithms in its footnotes.

About Mersenne Twister generator's period

I have read that Mersenne Twister generator has a period of 2¹⁹⁹³⁷ - 1, but I'm confused about why can that be possible. I see this implementation of the Mersenne Twister algorithm and in the first comment it clearly says that it produces values in the range 0 to 2³² - 1. Therefore, after it has produced 2³² - 1 different random numbers, it will necessarily come back to the starting point (the seed), so the period can be at maximum 2³² - 1.
Also (and tell me if I'm wrong, please), a computer can't hold the number (2¹⁹⁹³⁷ - 1) ~ 4.3×10⁶⁰⁰¹, at least in a single block of memory. What am I missing here?
Your confusion stems from thinking that the output number and the internal state of a PRNG have to be the same thing.
Some very old PRNGs used to do this, such as Linear Congruental Generators. In those generators, the current output was fed back into the generator for the next step.
However, most PRNGS, including the Mersenne Twister, work from a much larger state, which it updates and uses to generate a 32-bit number (it doesn't really matter which order this is done in for the purposes of this answer).
In fact, the Mersenne Twister does indeed store 624 times 32-bit values, and that is 19968 bits, enough to contain the very long period that you are wondering about. The values are handled separately (as unsigned 32-bit integers), not treated as one giant number in a single-step calculation. The 32-bit random number you get from the output is related to this state, but does not determine the next number by itself.
You are wrong at
Therefore, after it has produced 2³² - 1 different random numbers, it
will necessarily come back to the starting point (the seed)...
That's right that the next number can be the same with one of the number already generated, but the internal state of the random number generator will not be the same. (Noone told you that every number in the range 2³² - 1 will be generated at the 2³² - 1th step.) So there's no bijection between the random number generated and the internal state of the generator. The random number generated can be calculated from the state but you don't even have to do it. You can step the internal state also without creating the random number.
And of course, the computer doesn't store the whole number sequence. It calculates the random number from the internal state. Consider a number sequence like 1, -1, 1, -1 ... you can generate the Nth number without storing number of N elements.

generating sorted random numbers without exponentiation involved?

I am looking for a math equation or algorithm which can generate uniform random numbers in ascending order in the range [0,1] without the help of division operator. i am keen in skipping the division operation because i am implementing it in hardware. Thank you.
Generating the numbers in ascending (or descending) order means generating them sequentially but with the right distribution. That, in turn, means we need to know the distribution of the minimum of a set of size N, and then at each stage we need to use conditioning to determine the next value based on what we've already seen. Mathematically these are both straightforward except for the issue of avoiding division.
You can generate the minimum of N uniform(0,1)'s from a single uniform(0,1) random number U using the algorithm min = 1 - U**(1/N), where ** denotes exponentiation. In other words, the complement of the Nth root of a uniform has the same distribution as the minimum of N uniforms over the range [0,1], which can then be scaled to any other interval length you like.
The conditioning aspect basically says that the k values already generated will have eaten up some portion of the original interval, and that what we now want is the minimum of N-k values, scaled to the remaining range.
Combining the two pieces yields the following logic. Generate the smallest of the N uniforms, scale it by the remaining interval length (1 the first time), and make that result the last value we have generated. Then generate the smallest of N-1 uniforms, scale it by the remaining interval length, and add it to the last one to give you your next value. Lather, rinse, repeat, until you have done them all. The following Ruby implementation gives distributionally correct results, assuming you have read in or specified N prior to this:
last_u = 0.0
N.downto(1) do |i|
p last_u += (1.0 - last_u) * (1.0 - (rand ** (1.0/i)))
end
but we have that pesky ith root which uses division. However, if we know N ahead of time, we can pre-calculate the inverses of the integers from 1 to N offline and table them.
last_u = 0.0
N.downto(1) do |i|
p last_u += (1.0 - last_u) * (1.0 - (rand ** inverse[i]))
end
I don't know of any way get the correct distributional behavior sequentially without using exponentiation. If that's a show-stopper, you're going to have to give up on either the sequential nature of the process or the uniformity requirement.
You can try so-called "stratified sampling", which means you divide the range into bins and then sample randomly from bins. A sample thus generated is more uniform (less clumping) than a sample generated from the entire interval. For this reason, stratified sampling reduces the variance of Monte Carlo estimates (I don't suppose that's important to you, but that's why the method was invented, as a reduction of variance method).
It is an interesting problem to generate numbers in order, but my guess is that to get a uniform distribution over the entire interval, you will have to apply some formulas which require more computation. If you want to minimize computation time, I suspect you cannot do better than generating a sample and then sorting it.

How to program a function to return values on some sort of probability?

This question arose to me while I was playing FIFA.
Assumingly, they programmed a complex function which includes all the factors like shooting skills, distance, shot power etc. to calculate the probability that the shot hits the target. How would they have programmed something that the goal happens according to that probability?
In other words, like a function X() has the probability that it return 1 89% and 0 11%. How would I program it so that it returns 1 (approximately) 89 times in 100 trials?
Generate a uniformly-distributed random number between 0 and 1, and return true if the number is less than the desired probability (0.89).
For example, in IPython:
In [13]: from random import random
In [14]: vals = [random() < 0.89 for i in range(10000)]
In [15]: sum(vals)
Out[15]: 8956
In this realisation, 8956 out of the 10000 boolean outcomes are true. If we repeat the experiment, the number will vary around 8900.
That is not how goals are determined in FIFA or other video games. They don't have a function that says, with some probability, the shot makes it or doesn't.
Rather, they simulate a ball actually being kicked into a goal.
The ball will have some speed (based on the "shot power") and some trajectory angle (based on where the player aimed, and some variability based on the character's "shot skill"). Then they allow physics - and the AI of the goalee, if there is one - to take over, and count it as a point only when the ball physically enters the goal.
There is of course still randomness involved, but there is no single variable that decides whether or not a shot will make it.
I'm not 100% sure but one way i would achieve:
Generate a random number (between 0 and 100). If the number is 89 or greater than return 1, elsewise return 0.
If you have a random number generator, then you would do something like:
bool return_true_89_out_of_100() {
double random_n = rand(); // returns random between 0 and 89
return (random_n < 0.89);
}
You can generate a crudely random number by, for example, sampling lower bits of the CPU clock or some mathematical tricks.
You're tagged language agnostic, but the answer depends on what random number function(s) are available to you. Furthermore the accuracy may depend on how close to being truly random your generator is (generally they're not that close).
As to random number functions, there tend to be two kinds -- those which generate a number between 0 and 1, and those that generate a number between m and n. Each can be used to derive a percentage easily.

Generate random sequence of integers differing by 1 bit without repeats

I need to generate a (pseudo) random sequence of N bit integers, where successive integers differ from the previous by only 1 bit, and the sequence never repeats. I know a Gray code will generate non-repeating sequences with only 1 bit difference, and an LFSR will generate non-repeating random-like sequences, but I'm not sure how to combine these ideas to produce what I want.
Practically, N will be very large, say 1000. I want to randomly sample this large space of 2^1000 integers, but I need to generate something like a random walk because the application in mind can only hop from one number to the next by flipping one bit.
Use any random number generator algorithm to generate an integer between 1 and N (or 0 to N-1 depending on the language). Use the result to determine the index of the bit to flip.
In order to satisfy randomness you will need to store previously generated numbers (thanks ShreevatsaR). Additionally, you may run into a scenario where no non-repeating answers are possible so this will require a backtracking algorithm as well.
This makes me think of fractals - following a boundary in a julia set or something along those lines.
If N is 1000, use a 2^500 x 2^500 fractal bitmap (obviously don't generate it in advance - you can derive each pixel on demand, and most won't be needed). Each pixel move is one pixel up, down, left or right following the boundary line between pixels, like a simple bitmap tracing algorithm. So long as you start at the edge of the bitmap, you should return to the edge of the bitmap sooner or later - following a specific "colour" boundary should always give a closed curve with no self-crossings, if you look at the unbounded version of that fractal.
The x and y axes of the bitmap will need "Gray coded" co-ordinates, of course - a bit like oversized Karnaugh maps. Each step in the tracing (one pixel up, down, left or right) equates to a single-bit change in one bitmap co-ordinate, and therefore in one bit of the resulting values in the random walk.
EDIT
I just realised there's a problem. The more wrinkly the boundary, the more likely you are in the tracing to hit a point where you have a choice of directions, such as...
* | .
---+---
. | *
Whichever direction you enter this point, you have a choice of three ways out. Choose the wrong one of the other two and you may return back to this point, therefore this is a possible self-crossing point and possible repeat. You can eliminate the continue-in-the-same-direction choice - whichever way you turn should keep the same boundary colours to the left and right of your boundary path as you trace - but this still leaves a choice of two directions.
I think the problem can be eliminated by making having at least three colours in the fractal, and by always keeping the same colour to one particular side (relative to the trace direction) of the boundary. There may be an "as long as the fractal isn't too wrinkly" proviso, though.
The last resort fix is to keep a record of points where this choice was available. If you return to the same point, backtrack and take the other alternative.
While an algorithm like this:
seed()
i = random(0, n)
repeat:
i ^= >> (i % bitlen)
yield i
…would return a random sequence of integers differing each by 1 bit, it would require a huge array for backtracing to ensure uniqueness of numbers.
Further more your running time would increase exponentially(?) with increasing density of your backtrace, as the chance to hit a new and non-repeating number decreases with every number in the sequence.
To reduce time and space one could try to incorporate one of these:
Bloom Filter
Use a Bloom Filter to drastically reduce the space (and time) needed for uniqueness-backtracing.
As Bloom Filters come with the drawback of producing false positives from time to time a certain rate of falsely detected repeats (sic!) (which thus are skipped) in your sequence would occur.
While the use of a Bloom Filter would reduce the space and time your running time would still increase exponentially(?)…
Hilbert Curve
A Hilbert Curve represents a non-repeating (kind of pseudo-random) walk on a quadratic plane (or in a cube) with each step being of length 1.
Using a Hilbert Curve (on an appropriate distribution of values) one might be able to get rid of the need for a backtrace entirely.
To enable your sequence to get a seed you'd generate n (n being the dimension of your plane/cube/hypercube) random numbers between 0 and s (s being the length of your plane's/cube's/hypercube's sides).
Not only would a Hilbert Curve remove the need for a backtrace, it would also make the sequencer run in O(1) per number (in contrast to the use of a backtrace, which would make your running time increase exponentially(?) over time…)
To seed your sequence you'd wrap-shift your n-dimensional distribution by random displacements in each of its n dimension.
Ps: You might get better answers here: CSTheory # StackExchange (or not, see comments)

Resources