Is there a name for this algorithm? (I've been calling it changeBinary) - algorithm

Is there a name for this algorithm? (I've been calling it changeBinary)
DESCRIPTION:
You take a binary string as input.
The first bit of the output is the same as the first bit of the input.
Every bit after that is 0 if the bit at that index of the input string is the same as the bit at the previous index in the input string. Otherwise, it's 1.
For example,
Input: 00011000001010100001001000010011
Output: 00010100001111110001101100011010
Here is a simple javascript implementation:
var changeBinary = function(binaryString){
var output = binaryString[0] === '0' ? '0' : 1;
for (var i = 1; i < binaryString.length; i++){
var nextBit = binaryString[i] === binaryString[i - 1] ? '0' : '1';
output += nextBit;
}
return output;
}
OBSERVATIONS:
First, it seems that if you keep applying the algorithm to a string, it eventually returns to its original value. Second, it the number of iterations it takes to do so seems to always be a power of 2 (including 2^0 = 1). For example, if you apply the changeBinary function above 32 times to the string above, it will return to the original value.
Has anyone ever encountered this before, and if so, do you know of any other information about it?
It just seems to me like this is something so simple and basic that someone must have studied it more in depth.
Any feedback would be greatly appreciated.

It may be interesting to know that this is x ^ (x << 1) on a BigInteger (or, if you limit the length of the strings, the same thing but on a fixed-size integer), also describable as clmul(x, 3).
Carryless multiplication, which is essentially just like normal multiplication, but instead of adding the partial products you XOR them, has some fairly nice properties, such as being commutative and associative. The associative property is especially of interest since it allows you to reason easily about what composing your algorithm with itself a couple of times does: for example
changeBinary o changeBinary is clmul(clmul(x, 3), 3) = clmul(x, clmul(3, 3)) = clmul(x, 5)
That it's a carryless multiplication by 3 also explains why it "undoes" itself when applied often enough, as the carryless multiplicative inverse of 3 is the number with all bits set, which with 32 bits is 0xffffffff, which can be formed as 331 (with carryless exponentiation). This also follows from the equivalence of a carryless square to a "bit-spread", so it takes a bit string abcd to a0b0c0d, and thus clpow(3, 32) = 1 - 5 spreads have spread the bits so far apart that only the original lsb is left over, the rest does not fit in a 32bit number.
And that also gives a faster inversion, because the number with all bits set can be decomposed into small number of (carryless) factors:
3 x 5 x 17 x 257 x 65537 ...
With a number of factors that is the base two logarithm of the number of bits (rounded up).
Since x ^ (x >> 1) converts a number to Gray Code, I suppose you might call this a "mirrored" Gray Code. The same trick with the factors is used "in the mirror image" to convert a Gray Code back to binary:
x ^= x >> 1 // this is like a "mirror" of x = clmul(x, 3)
x ^= x >> 2 // 5
x ^= x >> 4 // 17
x ^= x >> 8
x ^= x >> 16
Here we just flip the direction of the shift to get:
x ^= x << 1
x ^= x << 2
x ^= x << 4
x ^= x << 8
x ^= x << 16
Which is clmul(x, 0xffffffff) and has also been called PS-XOR(x)

The algorithm you described is an example of Delta Encoding.

Related

Verify that a number can be decomposed into powers of 2

Is it possible to verify that a number can be decomposed into a sum of powers of 2 where the exponents are sequential?
Is there an algorithm to check this?
Example: where and
The binary representation would have a single, consecutive group of 1 bits.
To check this, you could first identify the value of the least significant bit, add that bit to the original value, and then check whether the result is a power of 2.
This leads to the following formula for a given x:
(x & (x + (x & -x))) == 0
This expression is also true when x is zero. If that case needs to be rejected as a solution, you need an extra condition for that.
In Python:
def f(x):
return x > 0 and (x & (x + (x & -x))) == 0
This can be done in an elegant way using bitwise operations to check whether the binary representation of the number is a single block of consecutive 1 bits, followed by perhaps some 0s.
The expression x & (x - 1) replaces the lowest 1 in the binary representation of x with a 0. If we call that number y, then y | (y >> 1) sets each bit to be a 1 if it had a 1 to its immediate left. If the original number x was a single block of consecutive 1 bits, then the result is the same as the number x that we started with, because the 1 which was removed will be replaced by the shift. On the other hand, if x is not a single block of consecutive 1 bits, then the shift will add at least one other 1 bit that wasn't there in the original x, and they won't be equal.
That works if x has more than one 1 bit, so the shift can put back the one that was removed. If x has only a single 1 bit, then removing it will result in y being zero. So we can check for that, too.
In Python:
def is_sum_of_consecutive_powers_of_two(x):
y = x & (x - 1)
z = y | (y >> 1)
return x == z or y == 0
Note that this returns True when x is zero, and that's the correct result if "a sum of consecutive powers of two" is allowed to be the empty sum. Otherwise, you will have to write a special case to reject zero.
A number can be represented as the sum of powers of 2 with sequential exponents iff its binary representation has all 1s adjacent.
E.g. the set of numbers that can be represented as 2^n + 2^n-1, n >= 1, is exactly those with two adjacent ones in the binary representation.
just like this:
bool check(int x) {/*the number you want to check*/
int flag = 0;
while (x >>= 1) {
if (x & 1) {
if (!flag) flag = 1;
if (flag == 2) return false;
}
if (flag == 1) flag = 2;
}
return true;
}
O(log n).

Generating random number in the range 0-N [duplicate]

I have seen this question asked a lot but never seen a true concrete answer to it. So I am going to post one here which will hopefully help people understand why exactly there is "modulo bias" when using a random number generator, like rand() in C++.
So rand() is a pseudo-random number generator which chooses a natural number between 0 and RAND_MAX, which is a constant defined in cstdlib (see this article for a general overview on rand()).
Now what happens if you want to generate a random number between say 0 and 2? For the sake of explanation, let's say RAND_MAX is 10 and I decide to generate a random number between 0 and 2 by calling rand()%3. However, rand()%3 does not produce the numbers between 0 and 2 with equal probability!
When rand() returns 0, 3, 6, or 9, rand()%3 == 0. Therefore, P(0) = 4/11
When rand() returns 1, 4, 7, or 10, rand()%3 == 1. Therefore, P(1) = 4/11
When rand() returns 2, 5, or 8, rand()%3 == 2. Therefore, P(2) = 3/11
This does not generate the numbers between 0 and 2 with equal probability. Of course for small ranges this might not be the biggest issue but for a larger range this could skew the distribution, biasing the smaller numbers.
So when does rand()%n return a range of numbers from 0 to n-1 with equal probability? When RAND_MAX%n == n - 1. In this case, along with our earlier assumption rand() does return a number between 0 and RAND_MAX with equal probability, the modulo classes of n would also be equally distributed.
So how do we solve this problem? A crude way is to keep generating random numbers until you get a number in your desired range:
int x;
do {
x = rand();
} while (x >= n);
but that's inefficient for low values of n, since you only have a n/RAND_MAX chance of getting a value in your range, and so you'll need to perform RAND_MAX/n calls to rand() on average.
A more efficient formula approach would be to take some large range with a length divisible by n, like RAND_MAX - RAND_MAX % n, keep generating random numbers until you get one that lies in the range, and then take the modulus:
int x;
do {
x = rand();
} while (x >= (RAND_MAX - RAND_MAX % n));
x %= n;
For small values of n, this will rarely require more than one call to rand().
Works cited and further reading:
CPlusPlus Reference
Eternally Confuzzled
Keep selecting a random is a good way to remove the bias.
Update
We could make the code fast if we search for an x in range divisible by n.
// Assumptions
// rand() in [0, RAND_MAX]
// n in (0, RAND_MAX]
int x;
// Keep searching for an x in a range divisible by n
do {
x = rand();
} while (x >= RAND_MAX - (RAND_MAX % n))
x %= n;
The above loop should be very fast, say 1 iteration on average.
#user1413793 is correct about the problem. I'm not going to discuss that further, except to make one point: yes, for small values of n and large values of RAND_MAX, the modulo bias can be very small. But using a bias-inducing pattern means that you must consider the bias every time you calculate a random number and choose different patterns for different cases. And if you make the wrong choice, the bugs it introduces are subtle and almost impossible to unit test. Compared to just using the proper tool (such as arc4random_uniform), that's extra work, not less work. Doing more work and getting a worse solution is terrible engineering, especially when doing it right every time is easy on most platforms.
Unfortunately, the implementations of the solution are all incorrect or less efficient than they should be. (Each solution has various comments explaining the problems, but none of the solutions have been fixed to address them.) This is likely to confuse the casual answer-seeker, so I'm providing a known-good implementation here.
Again, the best solution is just to use arc4random_uniform on platforms that provide it, or a similar ranged solution for your platform (such as Random.nextInt on Java). It will do the right thing at no code cost to you. This is almost always the correct call to make.
If you don't have arc4random_uniform, then you can use the power of opensource to see exactly how it is implemented on top of a wider-range RNG (ar4random in this case, but a similar approach could also work on top of other RNGs).
Here is the OpenBSD implementation:
/*
* Calculate a uniformly distributed random number less than upper_bound
* avoiding "modulo bias".
*
* Uniformity is achieved by generating new random numbers until the one
* returned is outside the range [0, 2**32 % upper_bound). This
* guarantees the selected random number will be inside
* [2**32 % upper_bound, 2**32) which maps back to [0, upper_bound)
* after reduction modulo upper_bound.
*/
u_int32_t
arc4random_uniform(u_int32_t upper_bound)
{
u_int32_t r, min;
if (upper_bound < 2)
return 0;
/* 2**32 % x == (2**32 - x) % x */
min = -upper_bound % upper_bound;
/*
* This could theoretically loop forever but each retry has
* p > 0.5 (worst case, usually far better) of selecting a
* number inside the range we need, so it should rarely need
* to re-roll.
*/
for (;;) {
r = arc4random();
if (r >= min)
break;
}
return r % upper_bound;
}
It is worth noting the latest commit comment on this code for those who need to implement similar things:
Change arc4random_uniform() to calculate 2**32 % upper_bound as
-upper_bound % upper_bound. Simplifies the code and makes it the
same on both ILP32 and LP64 architectures, and also slightly faster on
LP64 architectures by using a 32-bit remainder instead of a 64-bit
remainder.
Pointed out by Jorden Verwer on tech#
ok deraadt; no objections from djm or otto
The Java implementation is also easily findable (see previous link):
public int nextInt(int n) {
if (n <= 0)
throw new IllegalArgumentException("n must be positive");
if ((n & -n) == n) // i.e., n is a power of 2
return (int)((n * (long)next(31)) >> 31);
int bits, val;
do {
bits = next(31);
val = bits % n;
} while (bits - val + (n-1) < 0);
return val;
}
Definition
Modulo Bias is the inherent bias in using modulo arithmetic to reduce an output set to a subset of the input set. In general, a bias exists whenever the mapping between the input and output set is not equally distributed, as in the case of using modulo arithmetic when the size of the output set is not a divisor of the size of the input set.
This bias is particularly hard to avoid in computing, where numbers are represented as strings of bits: 0s and 1s. Finding truly random sources of randomness is also extremely difficult, but is beyond the scope of this discussion. For the remainder of this answer, assume that there exists an unlimited source of truly random bits.
Problem Example
Let's consider simulating a die roll (0 to 5) using these random bits. There are 6 possibilities, so we need enough bits to represent the number 6, which is 3 bits. Unfortunately, 3 random bits yields 8 possible outcomes:
000 = 0, 001 = 1, 010 = 2, 011 = 3
100 = 4, 101 = 5, 110 = 6, 111 = 7
We can reduce the size of the outcome set to exactly 6 by taking the value modulo 6, however this presents the modulo bias problem: 110 yields a 0, and 111 yields a 1. This die is loaded.
Potential Solutions
Approach 0:
Rather than rely on random bits, in theory one could hire a small army to roll dice all day and record the results in a database, and then use each result only once. This is about as practical as it sounds, and more than likely would not yield truly random results anyway (pun intended).
Approach 1:
Instead of using the modulus, a naive but mathematically correct solution is to discard results that yield 110 and 111 and simply try again with 3 new bits. Unfortunately, this means that there is a 25% chance on each roll that a re-roll will be required, including each of the re-rolls themselves. This is clearly impractical for all but the most trivial of uses.
Approach 2:
Use more bits: instead of 3 bits, use 4. This yield 16 possible outcomes. Of course, re-rolling anytime the result is greater than 5 makes things worse (10/16 = 62.5%) so that alone won't help.
Notice that 2 * 6 = 12 < 16, so we can safely take any outcome less than 12 and reduce that modulo 6 to evenly distribute the outcomes. The other 4 outcomes must be discarded, and then re-rolled as in the previous approach.
Sounds good at first, but let's check the math:
4 discarded results / 16 possibilities = 25%
In this case, 1 extra bit didn't help at all!
That result is unfortunate, but let's try again with 5 bits:
32 % 6 = 2 discarded results; and
2 discarded results / 32 possibilities = 6.25%
A definite improvement, but not good enough in many practical cases. The good news is, adding more bits will never increase the chances of needing to discard and re-roll. This holds not just for dice, but in all cases.
As demonstrated however, adding an 1 extra bit may not change anything. In fact if we increase our roll to 6 bits, the probability remains 6.25%.
This begs 2 additional questions:
If we add enough bits, is there a guarantee that the probability of a discard will diminish?
How many bits are enough in the general case?
General Solution
Thankfully the answer to the first question is yes. The problem with 6 is that 2^x mod 6 flips between 2 and 4 which coincidentally are a multiple of 2 from each other, so that for an even x > 1,
[2^x mod 6] / 2^x == [2^(x+1) mod 6] / 2^(x+1)
Thus 6 is an exception rather than the rule. It is possible to find larger moduli that yield consecutive powers of 2 in the same way, but eventually this must wrap around, and the probability of a discard will be reduced.
Without offering further proof, in general using double the number
of bits required will provide a smaller, usually insignificant,
chance of a discard.
Proof of Concept
Here is an example program that uses OpenSSL's libcrypo to supply random bytes. When compiling, be sure to link to the library with -lcrypto which most everyone should have available.
#include <iostream>
#include <assert.h>
#include <limits>
#include <openssl/rand.h>
volatile uint32_t dummy;
uint64_t discardCount;
uint32_t uniformRandomUint32(uint32_t upperBound)
{
assert(RAND_status() == 1);
uint64_t discard = (std::numeric_limits<uint64_t>::max() - upperBound) % upperBound;
RAND_bytes((uint8_t*)(&randomPool), sizeof(randomPool));
while(randomPool > (std::numeric_limits<uint64_t>::max() - discard)) {
RAND_bytes((uint8_t*)(&randomPool), sizeof(randomPool));
++discardCount;
}
return randomPool % upperBound;
}
int main() {
discardCount = 0;
const uint32_t MODULUS = (1ul << 31)-1;
const uint32_t ROLLS = 10000000;
for(uint32_t i = 0; i < ROLLS; ++i) {
dummy = uniformRandomUint32(MODULUS);
}
std::cout << "Discard count = " << discardCount << std::endl;
}
I encourage playing with the MODULUS and ROLLS values to see how many re-rolls actually happen under most conditions. A sceptical person may also wish to save the computed values to file and verify the distribution appears normal.
Mark's Solution (The accepted solution) is Nearly Perfect.
int x;
do {
x = rand();
} while (x >= (RAND_MAX - RAND_MAX % n));
x %= n;
edited Mar 25 '16 at 23:16
Mark Amery 39k21170211
However, it has a caveat which discards 1 valid set of outcomes in any scenario where RAND_MAX (RM) is 1 less than a multiple of N (Where N = the Number of possible valid outcomes).
ie, When the 'count of values discarded' (D) is equal to N, then they are actually a valid set (V), not an invalid set (I).
What causes this is at some point Mark loses sight of the difference between N and Rand_Max.
N is a set who's valid members are comprised only of Positive Integers, as it contains a count of responses that would be valid. (eg: Set N = {1, 2, 3, ... n } )
Rand_max However is a set which ( as defined for our purposes ) includes any number of non-negative integers.
In it's most generic form, what is defined here as Rand Max is the Set of all valid outcomes, which could theoretically include negative numbers or non-numeric values.
Therefore Rand_Max is better defined as the set of "Possible Responses".
However N operates against the count of the values within the set of valid responses, so even as defined in our specific case, Rand_Max will be a value one less than the total number it contains.
Using Mark's Solution, Values are Discarded when: X => RM - RM % N
EG:
Ran Max Value (RM) = 255
Valid Outcome (N) = 4
When X => 252, Discarded values for X are: 252, 253, 254, 255
So, if Random Value Selected (X) = {252, 253, 254, 255}
Number of discarded Values (I) = RM % N + 1 == N
IE:
I = RM % N + 1
I = 255 % 4 + 1
I = 3 + 1
I = 4
X => ( RM - RM % N )
255 => (255 - 255 % 4)
255 => (255 - 3)
255 => (252)
Discard Returns $True
As you can see in the example above, when the value of X (the random number we get from the initial function) is 252, 253, 254, or 255 we would discard it even though these four values comprise a valid set of returned values.
IE: When the count of the values Discarded (I) = N (The number of valid outcomes) then a Valid set of return values will be discarded by the original function.
If we describe the difference between the values N and RM as D, ie:
D = (RM - N)
Then as the value of D becomes smaller, the Percentage of unneeded re-rolls due to this method increases at each natural multiplicative. (When RAND_MAX is NOT equal to a Prime Number this is of valid concern)
EG:
RM=255 , N=2 Then: D = 253, Lost percentage = 0.78125%
RM=255 , N=4 Then: D = 251, Lost percentage = 1.5625%
RM=255 , N=8 Then: D = 247, Lost percentage = 3.125%
RM=255 , N=16 Then: D = 239, Lost percentage = 6.25%
RM=255 , N=32 Then: D = 223, Lost percentage = 12.5%
RM=255 , N=64 Then: D = 191, Lost percentage = 25%
RM=255 , N= 128 Then D = 127, Lost percentage = 50%
Since the percentage of Rerolls needed increases the closer N comes to RM, this can be of valid concern at many different values depending on the constraints of the system running he code and the values being looked for.
To negate this we can make a simple amendment As shown here:
int x;
do {
x = rand();
} while (x > (RAND_MAX - ( ( ( RAND_MAX % n ) + 1 ) % n) );
x %= n;
This provides a more general version of the formula which accounts for the additional peculiarities of using modulus to define your max values.
Examples of using a small value for RAND_MAX which is a multiplicative of N.
Mark'original Version:
RAND_MAX = 3, n = 2, Values in RAND_MAX = 0,1,2,3, Valid Sets = 0,1 and 2,3.
When X >= (RAND_MAX - ( RAND_MAX % n ) )
When X >= 2 the value will be discarded, even though the set is valid.
Generalized Version 1:
RAND_MAX = 3, n = 2, Values in RAND_MAX = 0,1,2,3, Valid Sets = 0,1 and 2,3.
When X > (RAND_MAX - ( ( RAND_MAX % n ) + 1 ) % n )
When X > 3 the value would be discarded, but this is not a vlue in the set RAND_MAX so there will be no discard.
Additionally, in the case where N should be the number of values in RAND_MAX; in this case, you could set N = RAND_MAX +1, unless RAND_MAX = INT_MAX.
Loop-wise you could just use N = 1, and any value of X will be accepted, however, and put an IF statement in for your final multiplier. But perhaps you have code that may have a valid reason to return a 1 when the function is called with n = 1...
So it may be better to use 0, which would normally provide a Div 0 Error, when you wish to have n = RAND_MAX+1
Generalized Version 2:
int x;
if n != 0 {
do {
x = rand();
} while (x > (RAND_MAX - ( ( ( RAND_MAX % n ) + 1 ) % n) );
x %= n;
} else {
x = rand();
}
Both of these solutions resolve the issue with needlessly discarded valid results which will occur when RM+1 is a product of n.
The second version also covers the edge case scenario when you need n to equal the total possible set of values contained in RAND_MAX.
The modified approach in both is the same and allows for a more general solution to the need of providing valid random numbers and minimizing discarded values.
To reiterate:
The Basic General Solution which extends mark's example:
// Assumes:
// RAND_MAX is a globally defined constant, returned from the environment.
// int n; // User input, or externally defined, number of valid choices.
int x;
do {
x = rand();
} while (x > (RAND_MAX - ( ( ( RAND_MAX % n ) + 1 ) % n) ) );
x %= n;
The Extended General Solution which Allows one additional scenario of RAND_MAX+1 = n:
// Assumes:
// RAND_MAX is a globally defined constant, returned from the environment.
// int n; // User input, or externally defined, number of valid choices.
int x;
if n != 0 {
do {
x = rand();
} while (x > (RAND_MAX - ( ( ( RAND_MAX % n ) + 1 ) % n) ) );
x %= n;
} else {
x = rand();
}
In some languages ( particularly interpreted languages ) doing the calculations of the compare-operation outside of the while condition may lead to faster results as this is a one-time calculation no matter how many re-tries are required. YMMV!
// Assumes:
// RAND_MAX is a globally defined constant, returned from the environment.
// int n; // User input, or externally defined, number of valid choices.
int x; // Resulting random number
int y; // One-time calculation of the compare value for x
y = RAND_MAX - ( ( ( RAND_MAX % n ) + 1 ) % n)
if n != 0 {
do {
x = rand();
} while (x > y);
x %= n;
} else {
x = rand();
}
There are two usual complaints with the use of modulo.
one is valid for all generators. It is easier to see in a limit case. If your generator has a RAND_MAX which is 2 (that isn't compliant with the C standard) and you want only 0 or 1 as value, using modulo will generate 0 twice as often (when the generator generates 0 and 2) as it will generate 1 (when the generator generates 1). Note that this is true as soon as you don't drop values, whatever the mapping you are using from the generator values to the wanted one, one will occurs twice as often as the other.
some kind of generator have their less significant bits less random than the other, at least for some of their parameters, but sadly those parameter have other interesting characteristic (such has being able to have RAND_MAX one less than a power of 2). The problem is well known and for a long time library implementation probably avoid the problem (for instance the sample rand() implementation in the C standard use this kind of generator, but drop the 16 less significant bits), but some like to complain about that and you may have bad luck
Using something like
int alea(int n){
assert (0 < n && n <= RAND_MAX);
int partSize =
n == RAND_MAX ? 1 : 1 + (RAND_MAX-n)/(n+1);
int maxUsefull = partSize * n + (partSize-1);
int draw;
do {
draw = rand();
} while (draw > maxUsefull);
return draw/partSize;
}
to generate a random number between 0 and n will avoid both problems (and it avoids overflow with RAND_MAX == INT_MAX)
BTW, C++11 introduced standard ways to the the reduction and other generator than rand().
With a RAND_MAX value of 3 (in reality it should be much higher than that but the bias would still exist) it makes sense from these calculations that there is a bias:
1 % 2 = 1
2 % 2 = 0
3 % 2 = 1
random_between(1, 3) % 2 = more likely a 1
In this case, the % 2 is what you shouldn't do when you want a random number between 0 and 1. You could get a random number between 0 and 2 by doing % 3 though, because in this case: RAND_MAX is a multiple of 3.
Another method
There is much simpler but to add to other answers, here is my solution to get a random number between 0 and n - 1, so n different possibilities, without bias.
the number of bits (not bytes) needed to encode the number of possibilities is the number of bits of random data you'll need
encode the number from random bits
if this number is >= n, restart (no modulo).
Really random data is not easy to obtain, so why use more bits than needed.
Below is an example in Smalltalk, using a cache of bits from a pseudo-random number generator. I'm no security expert so use at your own risk.
next: n
| bitSize r from to |
n < 0 ifTrue: [^0 - (self next: 0 - n)].
n = 0 ifTrue: [^nil].
n = 1 ifTrue: [^0].
cache isNil ifTrue: [cache := OrderedCollection new].
cache size < (self randmax highBit) ifTrue: [
Security.DSSRandom default next asByteArray do: [ :byte |
(1 to: 8) do: [ :i | cache add: (byte bitAt: i)]
]
].
r := 0.
bitSize := n highBit.
to := cache size.
from := to - bitSize + 1.
(from to: to) do: [ :i |
r := r bitAt: i - from + 1 put: (cache at: i)
].
cache removeFrom: from to: to.
r >= n ifTrue: [^self next: n].
^r
Modulo reduction is a commonly seen way to make a random integer generator avoid the worst case of running forever.
When the range of possible integers is unknown, however, there is no way in general to "fix" this worst case of running forever without introducing bias. It's not just modulo reduction (rand() % n, discussed in the accepted answer) that will introduce bias this way, but also the "multiply-and-shift" reduction of Daniel Lemire, or if you stop rejecting an outcome after a set number of iterations. (To be clear, this doesn't mean there is no way to fix the bias issues present in pseudorandom generators. For example, even though modulo and other reductions are biased in general, they will have no issues with bias if the range of possible integers is a power of 2 and if the random generator produces unbiased random bits or blocks of them.)
The following answer of mine discusses the relationship between running time and bias in random generators, assuming we have a "true" random generator that can produce unbiased and independent random bits. The answer doesn't even involve the rand() function in C because it has many issues. Perhaps the most serious here is the fact that the C standard does not explicitly specify a particular distribution for the numbers returned by rand(), not even a uniform distribution.
How to generate a random integer in the range [0,n] from a stream of random bits without wasting bits?
As the accepted answer indicates, "modulo bias" has its roots in the low value of RAND_MAX. He uses an extremely small value of RAND_MAX (10) to show that if RAND_MAX were 10, then you tried to generate a number between 0 and 2 using %, the following outcomes would result:
rand() % 3 // if RAND_MAX were only 10, gives
output of rand() | rand()%3
0 | 0
1 | 1
2 | 2
3 | 0
4 | 1
5 | 2
6 | 0
7 | 1
8 | 2
9 | 0
So there are 4 outputs of 0's (4/10 chance) and only 3 outputs of 1 and 2 (3/10 chances each).
So it's biased. The lower numbers have a better chance of coming out.
But that only shows up so obviously when RAND_MAX is small. Or more specifically, when the number your are modding by is large compared to RAND_MAX.
A much better solution than looping (which is insanely inefficient and shouldn't even be suggested) is to use a PRNG with a much larger output range. The Mersenne Twister algorithm has a maximum output of 4,294,967,295. As such doing MersenneTwister::genrand_int32() % 10 for all intents and purposes, will be equally distributed and the modulo bias effect will all but disappear.
I just wrote a code for Von Neumann's Unbiased Coin Flip Method, that should theoretically eliminate any bias in the random number generation process. More info can be found at (http://en.wikipedia.org/wiki/Fair_coin)
int unbiased_random_bit() {
int x1, x2, prev;
prev = 2;
x1 = rand() % 2;
x2 = rand() % 2;
for (;; x1 = rand() % 2, x2 = rand() % 2)
{
if (x1 ^ x2) // 01 -> 1, or 10 -> 0.
{
return x2;
}
else if (x1 & x2)
{
if (!prev) // 0011
return 1;
else
prev = 1; // 1111 -> continue, bias unresolved
}
else
{
if (prev == 1)// 1100
return 0;
else // 0000 -> continue, bias unresolved
prev = 0;
}
}
}

Most efficient way to evaluate a binary scalar product mod 2

I am currently performing Fourier transforms for some physics problem, and a huge bottleneck of my algorithm comes from the evaluation of a scalar product modulo 2.
For a given integer N, I have to represent all the numbers in binary up to 2^N-1.
For each of these numbers, represented as a binary vector (e.g. 15 = 2^3 + 2^2 +2+2^0 = (1,1,1,1,0,...,0)) I have to evaluate its scalar products with all numbers from 0 to 2^N-1 in binary form modulo 2.
(for example, the scalar product 1.15 =(1,0,0,...,0).(1,1,1,1,0,...,0)=1*1+1*0+...=1 mod 2)
Note that the components are kept in binary form during the reducing modulo 2
(1,1).(1,1)=1*1+1*1 and not 1*1+2*2
This is basically 2^(2N) scalar products that I have to perform and reduce modulo 2.
I am having difficulty to get more than N = 18.
I was wondering whether some clever mathematical trick can be used to greatly reduce the time spent doing them.
I was thinking of some kind of recursion (i.e. saving results for N in a file and deduce the results for N+1) but I am not sure this would help. Indeed, with this recursion, knowing the results for N, I could cut the vector for N+1 corresponding to the N part plus an additional digit, but then at each scalar product, instead of evaluating the scalar product, I would have to tell my computer to go and read a big file (because I probably wouldn't be able to keep it all in dynamic memory), which is probably time-consuming, perhaps more than the ~20 multiplications I have to perform for each of the products.
Is there any known optimized number-theoretical algorithm allowing the evaluation of such a scalar product modulo 2 very quickly ? Are there any rules or ideas I am not aware of that I could exploit ?
Sorry for the terrible formatting, I just can't get LateX to work in here.
The sum of the product of corresponding bits, modulo 2, will be equal to the number of 1 bits in the AND of the two numbers, modulo 2.
As you can get the binary representation of a number easily, it might not be necessary to actually create an array of bits for them, but just use the integer data type in your programming language, which allows for at least 32 bits. Many languages offer bit operators, such as a AND (&) and XOR (^).
Counting the 1 bits in a number can be done with the variable-precision SWAR algorithm.
Here is program in Python that calculates this product modulo 2 for 2 numbers:
def numberOfSetBits(i):
i = i - ((i >> 1) & 0x55555555);
i = (i & 0x33333333) + ((i >> 2) & 0x33333333);
return (((i + (i >> 4)) & 0x0F0F0F0F) * 0x01010101) >> 24;
def product(a, b):
return numberOfSetBits(a & b) % 2
Instead of counting the bits with numberOfSetBits, you could fold the bits together with XORs, first the 16 most significant bits with the 16 least significant bits, then of that result the 8 most significant with the 8 least significant bits, until you have one bit left. Again in Python:
def bitParity(i):
i = (i >> 16) ^ i
i = (i >> 8) ^ i
i = (i >> 4) ^ i
i = (i >> 2) ^ i
i = (i >> 1) ^ i
return i % 2
def product(a, b):
return bitParity(a & b)
If you change the order that you are evaluating these pairs (a matrix of size 2n x 2n), then you can efficiently figure out which products-mod-2 change in each row of your evaluation.
Using Gray code, you can iterate over each value from 0 ... 2n-1 in a special order where only 1 bit of the outer-loop value changes each time. You can store 1 bit for each value from 0 ... 2n-1 representing the previous row's product-mod-2 values, and then change it based on whether the changing bit has any effect, which it only does when the corresponding bit in the other (inner loop) number is 1 (if it's 0 then the binary AND will be 0 no matter what the value of the other bit).
In C:
int N = 5;
int max = (1 << N) - 1;
unsigned char* prev = calloc((1 << N) / 8, 1);
// for the first row all the products will be zero, so start at row 1
for(int a = 1; a <= max; a++)
{
int grey = a ^ (a >> 1); // compute the grey code
int prev_grey = (a - 1) ^ ((a - 1) >> 1);
int changed_bit = grey ^ prev_grey;
for(int b = 0; b <= max; b++)
{
// the product will be changed only if b has a 1 at the same place
// (otherwise it will be 0 regardless)
if(b & changed_bit)
{
prev[b >> 3] ^= (1 << (b & 7));
}
int mod = (prev[b >> 3] & (1 << (b & 7))) != 0;
printf("mod value of %d and %d is %d\n", grey, b, mod);
}
}
The inner loop can be optimized even more because you can easily figure out which values of b have a non-zero value in the position of the changed bit: for example if it's in position 10 then there will be runs of 1024 in a row of 0 then 1 etc. So you know that you have 1024 values where the product-mod-2 is the same as in the previous row etc. It's not clear to me if this helps you though because I don't know what you are doing with these products.
The inner loop could also be unrolled (e.g. 32 or 64 times) so that you don't read and write to the prev array each time, but rather process blocks of 32 or 64 bits at a time.

Understanding parity of a number

I'm going through "Elements of Programming Interviews" and the very first question is about computing the parity of a number ( whether the number of 1's in the binary representation is even or odd). The final solution provided does this:
short Parity(unsigned long x) {
x ^= x >> 32;
x ^= x >> 16;
x ^= x >> 8;
x ^= x >> 4;
x &= 0xf;
...
I understand that with that final value of x you can lookup the answer in a lookup table = 0x6996. But my question is why does the above code work? I've worked a 16 bit example out by hand and it does give the correct parity, I just don't understand it conceptually.
The lookup table is confusing. Let's drop it and keep going:
...
x ^= x >> 2;
x ^= x >> 1;
p = x&1;
To figure it out, let's start with the 1-bit case. In the 1-bit case, p=x, so if x is 1 the parity is odd, and if x is 0 the parity is even. Trivial.
Now the two bit case. You look at b0^b1 (bit 0 XOR bit 1) and that's the parity. If they're equal, the parity is even, otherwise it's odd. Simple, too.
Now let's add more bits. In the four bit example - b3 b2 b1 b0, we calculate x ^ (x>>2), which gives us a two bit number: b3^b1 b2^b0 . These are actual two parities - one for the odd bits of the original number, and one for the even bits of the original number. XOR the two parities and we get the original parity.
And now this goes on and on for us many bits as you want.
It works because,
the parity of a single bit is itself (base case)
the parity of the concatenation of bitstrings x and y is the xor of the parity of x and the parity of y
That gives a recursive algorithm by splitting every string down the middle until it's a base case, which you can then group by layer and flip upside down to get the code shown in the question .. sort of, because it ends early and apparently does the last step by lookup.
For an n bit number x, the answer is always in the rightmost n/(2 to the power z) bits of x after z iterations.
Let's take an example where n=8 and x=10110001 (b7, b6, b5, b4, b3, b2, b1, b0).
It's actual/correct answer is even_parity.
After 1 iteration
10110001
00001011
___________
x = 10111010
Rightmost 8/(2 to the power 1) = 4 digits of x = 1010(even parity)
After 2 iterations
10111010
00101110
___________
x = 10010100
Rightmost 8/(2 to the power 2) = 2 digits of x = 00(even parity)
After 3 iterations
10010100
01001010
___________
x = 11011110
Rightmost 8/(2 to the power 3) = 1 digit of x = 0(even parity)
Now we can extract any dth digit of a number b by ANDING it with a
number q in which only the dth digit is 1(one) and all other digits are 0(zero).
Here we want to extract the (0th digit)/(rightmost digit) of final value of x.
So let's AND it(i.e, ( final_value_of_x
(i.e, 11011110)
)
after (
2 to the power(
log n to the base 2
)
) iterations
) with 00000001 to get the answer.
11011110
00000001
_________________
00000000

Bitwise and in place of modulus operator

We know that for example modulo of power of two can be expressed like this:
x % 2 inpower n == x & (2 inpower n - 1).
Examples:
x % 2 == x & 1
x % 4 == x & 3
x % 8 == x & 7
What about general nonpower of two numbers?
Let's say:
x % 7==?
First of all, it's actually not accurate to say that
x % 2 == x & 1
Simple counterexample: x = -1. In many languages, including Java, -1 % 2 == -1. That is, % is not necessarily the traditional mathematical definition of modulo. Java calls it the "remainder operator", for example.
With regards to bitwise optimization, only modulo powers of two can "easily" be done in bitwise arithmetics. Generally speaking, only modulo powers of base b can "easily" be done with base b representation of numbers.
In base 10, for example, for non-negative N, N mod 10^k is just taking the least significant k digits.
References
JLS 15.17.3 Remainder Operator %
Wikipedia/Modulo Operation
There is only a simple way to find modulo of 2^i numbers using bitwise.
There is an ingenious way to solve Mersenne cases as per the link such as n % 3, n % 7...
There are special cases for n % 5, n % 255, and composite cases such as n % 6.
For cases 2^i, ( 2, 4, 8, 16 ...)
n % 2^i = n & (2^i - 1)
More complicated ones are hard to explain. Read up only if you are very curious.
This only works for powers of two (and frequently only positive ones) because they have the unique property of having only one bit set to '1' in their binary representation. Because no other class of numbers shares this property, you can't create bitwise-and expressions for most modulus expressions.
This is specifically a special case because computers represent numbers in base 2. This is generalizable:
(number)base % basex
is equivilent to the last x digits of (number)base.
There are moduli other than powers of 2 for which efficient algorithms exist.
For example, if x is 32 bits unsigned int then
x % 3 =
popcnt (x & 0x55555555) - popcnt (x & 0xaaaaaaaa)
Not using the bitwise-and (&) operator in binary, there is not. Sketch of proof:
Suppose there were a value k such that x & k == x % (k + 1), but k != 2^n - 1. Then if x == k, the expression x & k seems to "operate correctly" and the result is k. Now, consider x == k-i: if there were any "0" bits in k, there is some i greater than 0 which k-i may only be expressed with 1-bits in those positions. (E.g., 1011 (11) must become 0111 (7) when 100 (4) has been subtracted from it, in this case the 000 bit becomes 100 when i=4.) If a bit from the expression of k must change from zero to one to represent k-i, then it cannot correctly calculate x % (k+1), which in this case should be k-i, but there is no way for bitwise boolean and to produce that value given the mask.
Modulo "7" without "%" operator
int a = x % 7;
int a = (x + x / 7) & 7;
In this specific case (mod 7), we still can replace %7 with bitwise operators:
// Return X%7 for X >= 0.
int mod7(int x)
{
while (x > 7) x = (x&7) + (x>>3);
return (x == 7)?0:x;
}
It works because 8%7 = 1. Obviously, this code is probably less efficient than a simple x%7, and certainly less readable.
Using bitwise_and, bitwise_or, and bitwise_not you can modify any bit configurations to another bit configurations (i.e. these set of operators are "functionally complete"). However, for operations like modulus, the general formula would be necessarily be quite complicated, I wouldn't even bother trying to recreate it.

Resources