Sampling packets [closed] - random

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
I am programming a hardware packet processor, and I am trying to sample packets. My goal is to keep one out of 10 packets. However, I am aware that one cannot just keep every 10th packet as that would not be a correct sampling method.
I use a random number generator, although the number is always between 0 and (2^n) - 1. That is, if the random number is 4 bits in size, the random number generator will generate a number between 0 and 15.
My first approach was to generate a number of size 10 or 11 bits. Say a 10-bit number n would go between 0 and 1023. I was planning to modulo 10 the n random number and pick one out of the 10 numbers. Unfortunately, this is not possible as my hardware packet processor does not contemplate modulo operations without knowing n at compile time.
My second option was to make a simple if:
// between 0 and 1023. random() retrieves a random number with uniform distribution
int<10> n = random();
if n < 102{
keep_packet()
} else{
drop_packet()
}
I wonder if this last method is indeed a correct sampling method and as correct as doing n modulo 10.

It depends on what you mean by "correct".
Assuming n = random() outputs independent uniform random integers in the interval [0, 1023), then checking whether n < 102 will succeed 102/1024 of the time. 102 out of 1024 values will be accepted and the rest rejected. Does it correctly succeed 10% of the time? No. Does it correctly succeed 102/1024 of the time? Yes.
Likewise with the same assumption, checking whether n % 10 == 9 will succeed 102/1024 of the time. 102 out of 1024 values will be accepted (namely those that end in nine) and the rest rejected. Does it correctly succeed 10% of the time? No. Does it correctly succeed 102/1024 of the time? Yes.
There is a third way to proceed: rejection sampling. Keep generating n = random() until n < 1020, then check whether n < 102. This will correctly succeed 10% of the time. The tradeoff, though, is that there is a 4/1024 chance of requiring more than one random number, a (4/1024)^2 chance of requiring more than two, and so on. In fact, rejection sampling will run forever in the worst case (as is the case in general), and stopping the method after K rejections will introduce bias. However, you can make the bias as small as you like by choosing K appropriately (it may or may not be acceptable for you for a hardware device to "freeze").
See also:
Frugal conversion of uniformly distributed random numbers from one range to another
How to generate a random integer in the range [0,n] from a stream of random bits without wasting bits?

Related

Shuffle sequential numbers without a buffer

I am looking for a shuffle algorithm to shuffle a set of sequential numbers without buffering. Another way to state this is that I’m looking for a random sequence of unique numbers that have a given period.
Your typical Fisher–Yates shuffle needs to have each element all of the elements it is going to shuffle, so that isn’t going to work.
A Linear-Feedback Shift Register (LFSR) does what I want, but only works for periods that are powers-of-two less two. Here is an example of using a 4-bit LFSR to shuffle the numbers 1-14:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
8
12
14
7
4
10
5
11
6
3
2
1
9
13
The first two is the input, and the second row the output. What’s nice is that the state is very small—just the current index. You can start of any index and get a difference set of numbers (starting at 1 yields: 8, 12, 14; starting at 9: 6, 3, 2), although the sequence is always the same (5 is always followed by 11). If I want a different sequence, I can pick a different generator polynomial.
The limitations to the LFSR are that the periods are always power-of-two less two (the min and max are always the same, thus unshuffled) and there not enough enough generator polynomials to allow every possible random sequence.
A block cipher algorithm would work. Every key produces a uniquely shuffled set of numbers. However all block ciphers (that I know about) have power-of-two block sizes, and usually a fixed or limited number of block sizes. A block cipher with a arbitrary non-binary block size would be perfect if such a thing exists.
There are a couple of projects I have that could benefit from such an algorithm. One is for small embedded micros that need to produce a shuffled sequence of numbers with a period larger than the memory they have available (think Arduino Uno needing to shuffle 1 to 100,000).
Does such an algorithm exist? If not, what things might I search for to help me develop such an algorithm? Or is this simply not possible?
Edit 2022-01-30
I have received a lot of good feedback and I need to better explain what I am searching for.
In addition to the Arduino example, where memory is an issue, there is also the shuffle of a large number of records (billions to trillions). The desire is to have a shuffle applied to these records without needing a buffer to hold the shuffle order array, or the time needed to build that array.
I do not need an algorithm that could produce every possible permutation, but a large number of permutations. Something like a typical block cipher in counter mode where each key produces a unique sequence of values.
A Linear Congruential Generator using coefficients to produce the desired sequence period will only produce a single sequence. This is the same problem for a Linear Feedback Shift Register.
Format-Preserving Encryption (FPE), such as AES FFX, shows promise and is where I am currently focusing my attention. Additional feedback welcome.
It is certainly not possible to produce an algorithm which could potentially generate every possible sequence of length N with less than N (log2N - 1.45) bits of state, because there are N! possible sequence and each state can generate exactly one sequence. If your hypothetical Arduino application could produce every possible sequence of 100,000 numbers, it would require at least 1,516,705 bits of state, a bit more than 185Kib, which is probably more memory than you want to devote to the problem [Note 1].
That's also a lot more memory than you would need for the shuffle buffer; that's because the PRNG driving the shuffle algorithm also doesn't have enough state to come close to being able to generate every possible sequence. It can't generate more different sequences than the number of different possible states that it has.
So you have to make some compromise :-)
One simple algorithm is to start with some parametrisable generator which can produce non-repeating sequences for a large variety of block sizes. Then you just choose a block size which is as least as large as your target range but not "too much larger"; say, less than twice as large. Then you just select a subrange of the block size and start generating numbers. If the generated number is inside the subrange, you return its offset; if not, you throw it away and generate another number. If the generator's range is less than twice the desired range, then you will throw away less than half of the generated values and producing the next element in the sequence will be amortised O(1). In theory, it might take a long time to generate an individual value, but that's not very likely, and if you use a not-very-good PRNG like a linear congruential generator, you can make it very unlikely indeed by restricting the possible generator parameters.
For LCGs you have a couple of possibilities. You could use a power-of-two modulus, with an odd offset and a multiplier which is 5 mod 8 (and not too far from the square root of the block size), or you could use a prime modulus with almost arbitrary offset and multiplier. Using a prime modulus is computationally more expensive but the deficiencies of LCG are less apparent. Since you don't need to handle arbitrary primes, you can preselect a geometrically-spaced sample and compute the efficient division-by-multiplication algorithm for each one.
Since you're free to use any subrange of the generator's range, you have an additional potential parameter: the offset of the start of the subrange. (Or even offsets, since the subrange doesn't need to be contiguous.) You can also increase the apparent randomness by doing any bijective transformation (XOR/rotates are good, if you're using a power-of-two block size.)
Depending on your application, there are known algorithms to produce block ciphers for subword bit lengths [Note 2], which gives you another possible way to increase randomness and/or add some more bits to the generator state.
Notes
The approximation for the minimum number of states comes directly from Stirling's approximation for N!, but I computed the number of bits by using the commonly available lgamma function.
With about 30 seconds of googling, I found this paper on researchgate.net; I'm far from knowledgable enough in crypto to offer an opinion, but it looks credible; also, there are references to other algorithms in its footnotes.

using 10 MB of memory for four billion integers (about finding the optimized block size) [duplicate]

This question already has answers here:
Generate an integer that is not among four billion given ones
(38 answers)
Closed 7 years ago.
The problem is, given an input file with four billion integers, provide an algorithm to generate an integer which is not contained in the file, assume only have 10 MB of memory.
Searched for some solutions, one of which is to store integers into bit-vector blocks (each block representing a specific range of integers among 4 billion range, each bit in the block represent for an integer), and using another counter for each block, to count the number of integers in each block. So that if number of integers is less than the block capacity for integers, scan the bit-vector of the block to find which are missing integers.
My confusion for this solution is, it is mentioned the optimal smallest footprint is, when the array of block counters occupies the same memory as the bit vector. I am confused why in such situation it is the optimal smallest footprint?
Here are calculation details I referred,
Let N = 2^32.
counters (bytes): blocks * 4
bit vector (bytes): (N / blocks) / 8
blocks * 4 = (N / blocks) / 8
blocks^2 = N / 32
blocks = sqrt(N/2)/4
thanks in advance,
Lin
Why it is the smallest memory footprint:
In the solution you proposed, there are two phases:
Count number of integers in each block
This uses 4*(#blocks) bytes of memory.
Use a bit-vector each bit representing an integer in the block.
This uses (blocksize/8) bytes of memory, which is (N/blocks)/8.
Setting the 2 to be equal results in blocks = sqrt(N/32) as you have mentioned.
This is the optimal because the memory required is the maximum of the memory required in each phase (which must both be executed). After the 1st phase, you can forget the counters, except for which block to search in for phase 2.
Optimization
If your counter saturates when it reaches capacity, you don't really need 4 bytes per counter, but rather 3 bytes. A counter reaches capacity when it exceeds the number of integers in the block.
In this case, phase 1 uses 3*blocks of memory, and phase 2 uses (N/blocks)/8. Therefore, the optimal is blocks = sqrt(N/24). If N is 4 billion, the number of blocks is approx 12910, and the block size is 309838 integers per block. This fits in 3 bytes.
Caveats, and alternative with good average case performance
The algorithm you proposed only works if all input integers are distinct. In case the integers are not distinct, I suggest you simply go with a randomized candidate set of integers approach. In a randomized candidate set of integers approach, you can select say 1000 candidate integers at random, and check if any are not found in the input file. If you fail, you can try find another random set of candidate integers. While this has poor worst case performance, it would be faster in the average case for most input. For example, if the input integers have a coverage of 99% of possible integers, then on average, with 1000 candidate integers, 10 of them will not be found. You can select the candidate integers pseudo-randomly so that you never repeat a candidate integer, and also to guarantee that in a fixed number of tries, you will have tested all possible integers.
If each time, you check sqrt(N) candidate integers, then the worst case performance can be as good as N*sqrt(N), because you might have to scan all N integers sqrt(N) times.
You can avoid the worst case time if you use this alternative, and if it doesn't work for the first set of candidate integers, you switch to your proposed solution. This might give better average case performance (this is a common strategy in sorting where quicksort is used first, before switching to heapsort for example if it appears that the worst case input is present).
# assumes all integers are positive and fit into an int
# very slow, but definitely uses less than 10MB RAM
int generate_unique_integer(file input-file)
{
int largest=0
while (not eof(input-file))
{
i=read(integer)
if (i>largest) largest=i
}
return i++; #larger than the largest integer in the input file
}

random number generator test

How will you test if the random number generator is generating actual random numbers?
My Approach: Firstly build a hash of size M, where M is the prime number. Then take the number
generated by random number generator, and take mod with M.
and see it fills in all the hash or just in some part.
That's my approach. Can we prove it with visualization?
Since I have very less knowledge about testing. Can you suggest me a thorough approach of this question? Thanks in advance
You should be aware that you cannot guarantee the random number generator is working properly. Note that even a perfect uniform distribution in range [1,10] - there is a 10-10 chance of getting 10 times 10 in a random sampling of 10 numbers.
Is it likely? Of course not.
So - what can we do?
We can statistically prove that the combination (10,10,....,10) is unlikely if the random number generator is indeed uniformly distributed. This concept is called Hypothesis testing. With this approach we can say "with certainty level of x% - we can reject the hypothesis that the data is taken from a uniform distribution".
A common way to do it, is using Pearson's Chi-Squared test, The idea is similar to yours - you fill in a table - check what is the observed (generated) number of numbers for each cell, and what is the expected number of numbers for each cell under the null hypothesis (in your case, the expected is k/M - where M is the range's size, and k is the total number of numbers taken).
You then do some manipulation on the data (see the wikipedia article for more info what this manipulation is exactly) - and get a number (the test statistic). You then check if this number is likely to be taken from a Chi-Square Distribution. If it is - you cannot reject the null hypothesis, if it is not - you can be certain with x% certainty that the data is not taken from a uniform random generator.
EDIT: example:
You have a cube, and you want to check if it is "fair" (uniformly distributed in [1,6]). Throw it 200 times (for example) and create the following table:
number: 1 2 3 4 5 6
empirical occurances: 37 41 30 27 32 33
expected occurances: 33.3 33.3 33.3 33.3 33.3 33.3
Now, according to Pearson's test, the statistic is:
X = ((37-33.3)^2)/33.3 + ((41-33.3)^2)/33.3 + ... + ((33-33.3)^2)/33.3
X = (18.49 + 59.29 + 10.89 + 39.69 + 1.69 + 0.09) / 33.3
X = 3.9
For a random C~ChiSquare(5), the probability of being higher then 3.9 is ~0.45 (which is not improbable)1.
So we cannot reject the null hypothesis, and we can conclude that the data is probably uniformly distributed in [1,6]
(1) We usually reject the null hypothesis if the value is smaller then 0.05, but this is very case dependent.
My naive idea:
The generator is following a distribution. (At least it should.) Do a reasonable amount of runs then plot the values on a graph. Fit a regression curve on the points. If it correlates with the shape of the distribution you're good. (This is also possible in 1D with projections and histograms. And fully automatable with the correct tool, e.g. MatLab)
You can also use the diehard tests as it was mentioned before, that is surely better but involves much less intuition, at least on your side.
Let's say you want to generate a uniform distribution on the interval [0, 1].
Then one possible test is
for i from 1 to sample-size
when a < random-being-tested() < b
counter +1
return counter/sample-size
And see if the result is closed to b-a (b minus a).
Of course you should define a function taking a, b between 0 and 1 as inputs, and return the difference between the counter/sample-size and b-a. Loop through possible a, b, say of the multiples of 0.01, a < b. Print out a, b when the difference is larger than a preset epsilon, say 0.001.
Those are the a, b for which there are too many outliers.
If you let sample-size be 5000. Your random-being-tested will be called about 5000 * 5050 times in total, hopefully not too bad.
I had the same problem.
when I finish to write my code (using an external RNG engine)
I looked on the results and found that all of them fail Chi-Square test whenever I have to many results.
my code generated a random number and hold buckets of the amount of each result range.
I don't know why the Chi-square test fail when i have a lot of results.
during my research I saw that the C# Random.next() fail in any range of random and that some of the numbers have better odds than the other, further more i saw that the RNGCryptoServiceProvider random provider is not supporting good on big numbers.
when trying to get numbers in the range of 0-1,000,000,000 the numbers in the lower range 0-300M have better odds to appear...
as a result I'm using the RNGCryptoServiceProvider and if my range is higher than 100M i'm combine the number my self (RandomHigh*100M + RandomLow) and the ranges of both randoms is smaller than 100M so it good.
Good Luck!

Get 10000+ unique random numbers (performance) [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Create Random Number Sequence with No Repeats
I'd like to write an URL shortener that only uses numbers as short string.
I don't want to count up, I want the next new number to be random (or pseudo random).
At first thought algorithm would then look like this (pseudo code):
do
{
number = random(0,10000)
}
while (datastore.contains(number))
datastore.store(number, url)
The problem with this implementation is: As the datastore contains more numbers, the more likely it is that the loop will be executed multiple times. The performance will decrease over time.
Isn't there a better way to get a random number that is not already in use?
1) fill an array with sequential values
2) shuffle the array
Use an encryption. Since encryption is reversible, unique inputs generate unique outputs. For 64 bit numbers use a cypher with a 64 bit blocksize. For smaller block sizes, such as 32 bit or 16 bit, have a look at the Hasty Pudding Cypher.
Whatever block size you need, just encrypt the numbers 0, 1, 2, ... (in the appropriate block size) to generate as many unique non-sequential numbers as you need.
Some related questions: # 2394246, # 54059, # 158716, # 196017, and # 1608181.
The proper approach depends on how many numbers you will generate and on if realtime performance is required. If you draw no more than a small fraction of the numbers available in a range, average time per number for your code snippet is O(1), with slight increase of time per later number but still O(1). See, for example, question #1608181 answer in which I show that getting k numbers from a range of more than 2*k numbers with such code is O(k). (That answer also has C code to generate M numbers from a range of N numbers, in O(M) time when M<N/2, and explains how to use it for O(M) time when M>=N/2.)
If you want O(1) performance with a hard time limit, you can use the program just mentioned to pre-load an array, or can shuffle the whole range of integers, as mentioned by Justin. After that preprocessing, each access is O(1). Buf if you know you won't draw more than say 3000 numbers from your 1...10000 range, and don't have a hard time limit, the code you have will run in O(1) time on average, with probability of k passes decreasing like 0.3 ^ k; i.e., at worst about 70% chance of 1 pass, 21% for 2, 6% for 3, 2% for 4, 0.6% for 5, and so forth.

Fast generation of random numbers that appear random

I am looking for an efficient way to generate numbers that a human would perceive as being random. Basically, I think of this as avoiding long sequences of 0 or 1 bits. I expect humans to be viewing the bit pattern, and a very low powered cpu should be able to calculate near a thousand of these per second.
There are two different concepts that I can think of to do this, but I am lost finding a efficient way of accomplishing them.
Generate a random number with a fixed number of one bits. For a 32-bit random number, this requires up to 31 random numbers, using the Knuth selection algorithm. is there a more efficient way to generate a random number with some number of bits set? Unfortunately, 0000FFFF doesn't look very random.
Some form of "part-wise' density seems like it'd look better - but I can't come up with a clear way of doing so - I'd imagine going through each chunk, and calculate how far it is from the ideal density, and try to increase the bit density of the next chunk. This sounds complex.
Hopefully there's another algorithm that I haven't thought about for this. Thanks in advance for your help.
[EDIT]
I should be clearer with what I ask -
(a) Is there an efficient way to generate random numbers without "long" runs of a single bit, where "long" is a tunable parameter?
(b) Other suggestions on what would make a number appear to be less-random?
A linear feedback shift register probably does what you want.
Edit in light of an updated question: You should look at a shuffle bag, although I'm not sure how fast this could run. See also this question.
I don't really know what you mean by bit patterns that "look" random. Is there some algorithm for defining what that is? One way might be to formulate an array consisting of only those numbers which are random enough for your purpose, then, randomly select elements from that array and push them onto the stream. The thing you seem to be trying to do seems bizarre to me and may be doomed to failure though. What happens if you have two 32 bit numbers which taken individually would meet your criteria for apparent randomicity, but when placed side by side make a sufficiently long stream of 0's or 1's to look made up?
Finally, I couldn't resist this.
You need to decide by exactly what rules you decide if something "looks random". Then you take a random number generator that produces enough "real randomness" for your purpose, and every time it generates a number that doesn't look random enough, you throw that number away and generate a new one.
Or you directly produce a sequence of "random" bits and every time the random generator outputs the "wrong" next bit (that would make it look not-random), you just flip that bit.
Here's what I'd do. I'd use a number like 00101011100101100110100101100101 and rotate it by some random amount each time.
But are you sure that a typical pseudo random generator wouldn't do? Have you tried it? You con't very many long strings of 0s and 1s anyhow.
If you're going to use a library random number and you're worried about too many or too few bits being set, there are cheap ways of counting bits.
Random numbers often have long sequences of 1s and 0s, so I'm not sure I fully understand why you can't use a simple linear congruential generator and shift in or out how ever many bits you need. They're blazing fast, look extremely random to the naked eye, and you can choose coefficients that will yield random integers in whatever positive range you need. If you need 32 "random looking" bits, just generate four random numbers and take the low 8 bits from each.
You don't really need to implement your own at all though, since in most languages the random library already implements one.
If you're determined that you want a particular density of 1s, though, you could always start with a number that has the required number of 1s set
int a = 0x00FF;
then use a bit twiddling hack to implement a bit-level shuffle of the bits in that number.
If you are looking to avoid long runs, how about something simple like:
#include <cstdlib>
class generator {
public:
generator() : last_num(0), run_count(1) { }
bool next_bit() {
const bool flip = rand() > RAND_MAX / pow( 2, run_count);
// RAND_MAX >> run_count ?
if(flip) {
run_count = 1;
last_num = !last_num;
} else
++run_count;
return last_num;
}
private:
bool last_num;
int run_count;
};
Runs become less likely the longer they go on. You could also do RAND_MAX / 1+run_count if you wanted longer runs
Since you care most about run length, you could generate random run lengths instead of random bits, so as to give them the exact distribution you want.
The mean run length in random binary data is of course 4 (sum of n/(2^(n-1))), and the mode average 1. Here are some random bits (I swear this is a single run, I didn't pick a value to make my point):
0111111011111110110001000101111001100000000111001010101101001000
See there's a run length of 8 in there. This is not especially surprising, since run length 8 should occur roughly every 256 bits and I've generated 64 bits.
If this doesn't "look random" to you because of excessive run lengths, then generate run lengths with whatever distribution you want. In pseudocode:
loop
get a random number
output that many 1 bits
get a random number
output that many 0 bits
endloop
You'd probably want to discard some initial data from the stream, or randomise the first bit, to avoid the problem that as it stands, the first bit is always 1. The probability of the Nth bit being 1 depends on how you "get a random number", but for anything that achieves "shortish but not too short" run lengths it will soon be as close to 50% as makes no difference.
For instance "get a random number" might do this:
get a uniformly-distributed random number n from 1 to 81
if n is between 1 and 54, return 1
if n is between 55 and 72, return 2
if n is between 72 and 78, return 3
if n is between 79 and 80, return 4
return 5
The idea is that the probability of a run of length N is one third the probability of a run of length N-1, instead of one half. This will give much shorter average run lengths, and a longest run of 5, and would therefore "look more random" to you. Of course it would not "look random" to anyone used to dealing with sequences of coin tosses, because they'd think the runs were too short. You'd also be able to tell very easily with statistical tests that the value of digit N is correlated with the value of digit N-1.
This code uses at least log(81) = 6.34 "random bits" to generate on average 1.44 bits of output, so is slower than just generating uniformly-distributed bits. But it shouldn't be much more than about 7/1.44 = 5 times slower, and a LFSR is pretty fast to start with.
This is how I would examine the number:
const int max_repeated_bits = 4; /* or any other number that you prefer */
int examine_1(unsigned int x) {
for (int i=0; i<max_repeated_bits; ++i) x &= (x << 1);
return x == 0;
}
int examine(unsigned int x) {
return examine_1(x) && examine_1(~x);
}
Then, just generate a number x, if examine(x) return 0, reject it and try again. The probability to get a 32-bit number with more than 4 bits in a row is about 2/3, so you would need about 3 random generator callse per number. However, If you allow more than 4 bits, it gets better. Say, the probability to get more than 6 bits in a row only about 20%, so you would need only 1.25 calls per number.
There are various variants of linear feedback shift registers, such as shrinking and self-shrinking which modify the output of one LFSR based on the output of another.
The design of these attempts to create random numbers, where the probability of getting two bits the same in a row is 0.5, of getting three in a row is 0.25 as so on.
It should be possible to chain two LFSRs to inhibit or invert the output when a sequence of similar bits occurs - the first LFSR uses a conventional primitive polynomial, and the feed the output of the first into the second. The second shift register is shorter, doesn't have a primitive polynomial. Instead it is tapped to invert the output if all its bits are the same, so no run can exceed the size of the second shift register.
Obviously this destroys the randomness of the output - if you have N bits in a row, the next bit is completely predictable. Messing around with using the output of another random source to determine whether or not to invert the output would defeat the second shift register - you wouldn't be able to detect the difference between that and just one random source.
Check out the GSL. I believe it has some functions that do just what you want. They at least are guaranteed to be random bit strings. I'm not sure if they would LOOK random, since thats more of a psychological question.
Can't believe nobody mentioned this:
If you want a longest run (period) of 2N repeats:
PeopleRandom()
{
while(1)
{
Number = randomN_bitNumber();
if(Number && Number != MaxN_BitNumber)
return Number;
}
}
this gives much better results in terms of amount of tosses than using a 32-bit, etc rand
pros:
you only toss values 2/2^N of the time.
larger N give better results.
Since the number of values that do not split the value with a 1 in the middle bit is exactly half, you can go with a larger N than you otherwise would have if you can tolerate a larger largest run less than half the time.
One simple approach would be to generate one bit at a time, with a tuning parameter to control the probability that each new bit matches the previous one. By setting the probability below 0.5, you can generate sequences that are less likely to contain long runs of repeating bits (and you can tune that likelihood). Setting p = 0 gives a repeating 1010101010101010 sequence; setting p = 1 gives a sequence of all 0s or all 1s.
Here is some C# to demonstrate:
double p = 0.3; // 0 <= p <= 1, probability of duplicating a bit
var r = new Random();
int bit = r.Next(2);
for (int i = 0; i < 100; i++)
{
if (r.NextDouble() > p)
{
bit = (bit + 1) % 2;
}
Console.Write(bit);
}
This might well be too slow for your needs, since you need to generate a random double in order to obtain each new random bit. You could, instead, generate a random byte and use each pair of bits to generate the new bit (i.e. if both are zero then keep the same bit, otherwise flip it, if you're happy with the equivalent of a fixed p = 0.25).
Furthermore, it's still possible to get long sequences of repeated bits, you've just lowered the probability of doing so.

Resources