Is there a cheap way to obfuscate a 64bit int? [duplicate] - random

I'm looking to create a deterministic number generation function where the input number will always generate the same number, but no two numbers will end up generating the same result.
E.g.:
1 -> 3
2 -> 5
3 -> 4
4 -> 2
5 -> 1
However I need this to work for all numbers that can be represented by a specific datatype, e.g. an int64.
This feels like something that should either be really straightforward, or completely impossible. Is there some random number generation scheme that guarantees this sort of distribution without me having to create an array of all possible numbers, randomly sort, and then use the index (and making me run out of memory in the meantime)?
Many thanks
F

The transformation formula you need is:
f(P) = (mP + s) mod n
// n = range - so for uint64 2^64
// s < range i.e. < 2^64
// m = must be coprime with n
This is modular arithmetic as used in the Affine cipher.
The mod ensures it is within desired range, s is a simple shift and m should be coprime with n.
Coprime simply means n and m should not share any common factors.
Since n is 2^64 its only factors are the number 2 - so m should basically not be even (i.e. not divisible by 2):
So for uint64 range:
var (
m = uint64(39293) // some non-even number
s = uint64(75321908) // some random number below 2^64
)
func transform(p uint64) uint64 {
return p*m + s // implicitly mod'ed 2^64 by the type's size
}
This may appear magic, but you can convince yourself it works with uint16:
https://go.dev/play/p/EKB6SH3-SGu
(as uint64 would take quite some resources to run :-)
Update:
For signed numbers (i.e. int64) the logic is no different. Since we know we have a unique one-to-one mapping with uint64 one approach would simply be to convert inputs & outputs from uint64 to int64 and vice versa:
// original unsigned version
func transform(p uint64) uint64 {
return m*p + s
}
func signedTransform(p int64) int64 {
return int64(transform(uint64(p)))
}
again here's a int16 example proving there's no collisions:
https://go.dev/play/p/Fkw5FLMK0Fu

To add to colm.anseo answer, this kind of mapping is also known as Linear congruential generator.
Xn+1 = (aXn+c) mod m
When c ≠ 0, correctly chosen parameters allow a period equal to m, for all seed values. This will occur if and only if:
m and c are relatively prime,
a-1 is divisible by all prime factors of m,
a-1 is divisible by 4 if m is divisible by 4.
These three requirements are referred to as the Hull–Dobell Theorem.
For 64bit LCG a=6364136223846793005 and c=1442695040888963407 from Knuth looks good.
Note, that LCG mapping is 1-to-1, it maps whole [0...264-1] region to itself. You could invert it if you want. And as RNG it has distinctive ability for jump-ahead.

Related

How to subdivide integer hash into ranges

I have unsigned 64bit number, representing mantissa, or fraction (which represent range from [0..1), where 0.0 maps to 0 and 0xffffff.. maps to a number "just before 1.0")
Now i want to split this range into equal buckets - and to answer - given random number key, to which part of the range it will fall to?
Its easier to get from following code:
func BucketIndex(key, buckets uint64) uint64 {
return uint64(float64(key) / ((math.Pow(2, 64) / float64(buckets)))
}
My attempt to "hack this over" - was to split 2^64 to two, like if I will reduce range to 32bit, and operate in 64bit in order to conduct math:
// ~=key / ((1 << 64) / buckets)
return ((key >> 32) * buckets) >> 32
but ranges stopped to be equal..
eg one third (buckets==3) will be at 0x5555555600000000, instead of being at 0x5555555555555556
thats sad story, so im asking do you know of a better methods of finding (1 << 64) / buckets?
If buckets is (compile-time) constant, you may use constant expression to calculate bucket size: constants are of arbitrary size. Else you may use big.Int to calculate it at runtime, and store the result (so you don't have to use big.Int calculations all the time).
Using a constant expression, at compile-time
To achieve an integer division rounding up, add divisor - 1 to the dividend:
const (
max = math.MaxUint64 + 1
buckets = 3
bucketSize = uint64((max + buckets - 1) / buckets)
)
Using big.Int, at runtime
We can use the above same logic with big.Int too. An alternative would be to use Int.DivMod() (instead of adding buckets -1), and if mod is greater than zero, increment the result by 1.
func calcBucketSize(max, buckets *big.Int) uint64 {
max = max.Add(max, buckets)
max = max.Add(max, big.NewInt(-1))
return max.Div(max, buckets).Uint64()
}
var bucketSize = calcBucketSize(new(big.Int).SetUint64(math.MaxUint64), big.NewInt(3))

Generating a deterministic int from another with no duplicates

I'm looking to create a deterministic number generation function where the input number will always generate the same number, but no two numbers will end up generating the same result.
E.g.:
1 -> 3
2 -> 5
3 -> 4
4 -> 2
5 -> 1
However I need this to work for all numbers that can be represented by a specific datatype, e.g. an int64.
This feels like something that should either be really straightforward, or completely impossible. Is there some random number generation scheme that guarantees this sort of distribution without me having to create an array of all possible numbers, randomly sort, and then use the index (and making me run out of memory in the meantime)?
Many thanks
F
The transformation formula you need is:
f(P) = (mP + s) mod n
// n = range - so for uint64 2^64
// s < range i.e. < 2^64
// m = must be coprime with n
This is modular arithmetic as used in the Affine cipher.
The mod ensures it is within desired range, s is a simple shift and m should be coprime with n.
Coprime simply means n and m should not share any common factors.
Since n is 2^64 its only factors are the number 2 - so m should basically not be even (i.e. not divisible by 2):
So for uint64 range:
var (
m = uint64(39293) // some non-even number
s = uint64(75321908) // some random number below 2^64
)
func transform(p uint64) uint64 {
return p*m + s // implicitly mod'ed 2^64 by the type's size
}
This may appear magic, but you can convince yourself it works with uint16:
https://go.dev/play/p/EKB6SH3-SGu
(as uint64 would take quite some resources to run :-)
Update:
For signed numbers (i.e. int64) the logic is no different. Since we know we have a unique one-to-one mapping with uint64 one approach would simply be to convert inputs & outputs from uint64 to int64 and vice versa:
// original unsigned version
func transform(p uint64) uint64 {
return m*p + s
}
func signedTransform(p int64) int64 {
return int64(transform(uint64(p)))
}
again here's a int16 example proving there's no collisions:
https://go.dev/play/p/Fkw5FLMK0Fu
To add to colm.anseo answer, this kind of mapping is also known as Linear congruential generator.
Xn+1 = (aXn+c) mod m
When c ≠ 0, correctly chosen parameters allow a period equal to m, for all seed values. This will occur if and only if:
m and c are relatively prime,
a-1 is divisible by all prime factors of m,
a-1 is divisible by 4 if m is divisible by 4.
These three requirements are referred to as the Hull–Dobell Theorem.
For 64bit LCG a=6364136223846793005 and c=1442695040888963407 from Knuth looks good.
Note, that LCG mapping is 1-to-1, it maps whole [0...264-1] region to itself. You could invert it if you want. And as RNG it has distinctive ability for jump-ahead.

Inner workings of `rand.Intn` function - GoLang

Somehow, I happened to look at source code for Go on how it implements Random function when passed a length of array.
Here's the calling code
func randomFormat() string {
formats := []string{
"Hi, %v. Welcome!",
"Great to see you, %v!",
"Hail, %v! Well met!",
}
return formats[rand.Intn(len(formats))]
}
Go Source code: main part
func (r *Rand) Intn(n int) int {
if n <= 0 {
panic("invalid argument to Intn")
}
if n <= 1<<31-1 {
return int(r.Int31n(int32(n)))
}
return int(r.Int63n(int64(n)))
}
Go Source code: reference part - Most of devs have this already on their machines or go repo.
// Int31n returns, as an int32, a non-negative pseudo-random number in [0,n).
// It panics if n <= 0.
func (r *Rand) Int31n(n int32) int32 {
if n <= 0 {
panic("invalid argument to Int31n")
}
if n&(n-1) == 0 { // n is power of two, can mask
return r.Int31() & (n - 1)
}
max := int32((1 << 31) - 1 - (1<<31)%uint32(n))
v := r.Int31()
for v > max {
v = r.Int31()
}
return v % n
}
// It panics if n <= 0.
func (r *Rand) Int63n(n int64) int64 {
if n <= 0 {
panic("invalid argument to Int63n")
}
if n&(n-1) == 0 { // n is power of two, can mask
return r.Int63() & (n - 1)
}
max := int64((1 << 63) - 1 - (1<<63)%uint64(n))
v := r.Int63()
for v > max {
v = r.Int63()
}
return v % n
}
func (r *Rand) Int31() int32 { return int32(r.Int63() >> 32) }
func (r *Rand) Int63() int64 { return r.src.Int63() }
type Source interface {
Int63() int64
Seed(seed int64)
}
I want to understand how the random function works encapsulating all inner functions. I am overwhelmed by the code and if someone has to plan the steps out in plain English what would those be?
For example, I don't get the logic for doing minus 1 in
if n <= 1<<31-1
Then, I don't get any of the head or toe of Int31n function
if n&(n-1) == 0 { // n is power of two, can mask
return r.Int31() & (n - 1)
}
max := int32((1 << 31) - 1 - (1<<31)%uint32(n))
v := r.Int31()
for v > max {
v = r.Int31()
}
return v % n
This is more of a question about algorithms than it is about Go, but there are some Go parts. In any case I'll start with the algorithm issues.
Shrinking the range of a uniform random number generator
Suppose that we have a uniform-distribution random number generator that returns a number between, say, 0 and 7 inclusive. That is, it will, over time, return about the same number of 0s, 1s, 2s, ..., 7s, but with no apparent pattern between them.
Now, if we want a uniformly distributed random number between 0 and 7, this thing is perfect. That's what it returns. We just use it. But what if we want a uniformly distributed random number between 0 and 6 instead?
We could write:
func randMod7() int {
return generate() % 7
}
so that if generate() returns 7 (which it has a 1 out of 8 chance of doing), we convert that value to zero. But then we'll get zero back 2 out of 8 times, instead of 1 out of 8 times. We'll get 1, 2, 3, 4, 5, and 6 back 1 out of 8 times, and zero 2 out of 8 times, on average: once for each actual zero, and once for each 7.
What we need to do, then, is throw away any occurrences of 7:
func randMod7() int {
for {
if i := generate() < 7 {
return i
}
// oops, got 7, try again
}
}
Now, if we had a uniform-random-number-generator named generate() that returned a value between 0 and (say) 11 (12 possible values) and we wanted a value between 0 and 3 (four possible values), we could just use generate() % 4, because the 12 possible results would fall into 3 groups of four with equal probability. If we wanted a value between 0 and 5 inclusive, we could use generate() % 6, because the 12 possible results would fall into two groups of 6 with equal probability. In fact, all we need to do is examine the prime factorization of the range of our uniform number generator to see what moduli work. The factors of 12 are 2, 2, 3; so 2, 3, 4, and 6 all work here. Any other modulus, such as generate() % 10, produce a biased result: 0 and 1 occur 2 out of 12 times, but 2 through 9 occur 1 out of 12 times. (Note: generate() % 12 also works, but is kind of pointless.)
In our particular case, we have two different uniform random number generators available. One, Int31(), produces values between 0 and 0x7fffffff (2147483647 decimal, or 231 - 1, or 1<<31 - 1) inclusive. The other, Int63(), produces values between 0 and 0x7fffffffffffffff (9223372036854775807, or 263 - 1, or 1<<63 - 1). These are ranges that hold 231 and 263 values respectively, and hence their prime factorization is 31 2s, or 63 2s.
What this means is that we can compute Int31() mod 2k, for any integer k in zero to 31 inclusive, without messing up our uniformity. With Int63(), we can do the same with k ranging all the way up to 63.
Introducing the computer
Now, mathematically-and-computer-ly speaking, given any nonnegative integer n in [0..0x7ffffff] or [0..0x7fffffffffffffff], and a non-negative integer k in the right range (no more than 31 or 63 respectively), computing that integer n mod 2k produces the same result as computing that integer and doing a bit-mask operation with k bits set. To get that number of set bits, we want to take 1<<k and subtract 1. If k is, say, 4, we get 1<<4 or 16. Subtracting 1, we get 15, or 0xf, which has four 1 bits in it.
So:
n % (1 << k)
and:
n & (1<<k - 1)
produce the same result. Concretely, when k==4, this is n%16 or n&0xf. When k==5 this is n%32 or n&0x1f. Try it for k==0 and k==63.
Introducing Go-the-language
We're now ready to consider doing all of this in Go. We note that int (plain, unadorned int) is guaranteed to be able to hold values between -2147483648 and +2147483647 (-0x80000000 through +0x7fffffff) respectively. It may extend all the way to -0x8000000000000000 through +0x7ffffffffffffff.
Meanwhile, int32 always handles the smaller range and int64 always handles the larger range. The plain int is a different type from these other two, but implements the same range as one of the two. We just don't know which one.
Our Int31 implementation returns a uniformly distributed random number in the 0..0x7ffffff range. (It does this by returning the upper 32 bits of r.Int63(), though this is an implementation detail.) Our Int63 implementation returns a uniformly distributed random number in the 0..0x7ffffffffffffff range.
The Intn function you show here:
func (r *Rand) Intn(n int) int {
if n <= 0 {
panic("invalid argument to Intn")
}
if n <= 1<<31-1 {
return int(r.Int31n(int32(n)))
}
return int(r.Int63n(int64(n)))
}
just picks one of the two functions, based on the value of n: if it's less than or equal to 0x7fffffff (1<<31 - 1), the result fits in int32, so it uses int32(n) to convert n to int32, calls r.Int31n, and converts the result back to int. Otherwise, the value of n exceeds 0x7fffffff, implying that int has the larger range and we must use the larger-range generator, r.Int63n. The rest is the same except for types.
The code could just do:
return int(r.Int63n(int64(n)))
every time, but on 32-bit machines, where 64-bit arithmetic may be slow, this might be slow. (There's a lot of may and might here and if you were writing this yourself today, you should start by profiling / benchmarking the code. The Go authors did do this, though this was many years ago; at that time it was worth doing this fancy stuff.)
More bit-manipulation
The insides of both functions Int31n and Int63n are quite similar; the main difference is the types involved, and then in a few places, the maximum values. Again, the reason for this is at least partly historical: on some (mostly old now) computers, the Int63n variant is significantly slower than the Int32n variant. (In some non-Go language, we might write these as generics and then have the compiler generate a type-specific version automatically.) So let's just look at the Int63 variant:
func (r *Rand) Int63n(n int64) int64 {
if n <= 0 {
panic("invalid argument to Int63n")
}
if n&(n-1) == 0 { // n is power of two, can mask
return r.Int63() & (n - 1)
}
max := int64((1 << 63) - 1 - (1<<63)%uint64(n))
v := r.Int63()
for v > max {
v = r.Int63()
}
return v % n
}
The argument n has type int64, so that its value will not exceed 263-1 or 0x7fffffffffffffff or 9223372036854775807. But it could be negative, and negative values won't work right, so the first thing we do is test for that and panic if so. We also panic if the input is zero (this is something of a choice, but it's useful to note it now).
Next we have the n&(n-1) == 0 test. This is a test for powers of two, with one slight flaw, and it works in many languages (those that have bit-masking):
A power of two is always represented as a single set bit, in the binary representation of a number. For instance, 2 itself is 000000012, 4 is 000000102, 8 is 000001002, and so on, through 128 being 100000002. (Since I only "drew" eight bits this series maxes out at 128.)
Subtracting 1 from that number causes a borrow: that bit goes to zero, and all the lesser bits become 1. For instance, 100000002 - 1 is 011111112.
AND-ing these two together produces zero if there was just the single bit set initially. If not—for instance, if we have the value 130 or 100000102 initially, subtracting 1 produces 100000012—there's no borrow out of the top bit, so the top bit is set in both inputs and therefore is set in the AND-ed result.
The slight flaw is that if the initial value is zero, then we have 0-1, which produces all-1s; 0&0xffffffffffffffff is zero too, but zero is not an integer power of two. (20 is 1, not 0.) This minor flaw is not important for our purpose here, because we already made sure to panic for this case: it just doesn't happen.
Now we have the most complicated line of all:
max := int64((1 << 63) - 1 - (1<<63)%uint64(n))
The recurring 63s here are because we have a value range going from zero to 263-1. 1<<63 - 1 is (still, again, always) 9223372036854775807 or 0x7fffffffffffffff. Meanwhile, 1<<63, without 1 subtracted from it, is 9223372036854775808 or 0x8000000000000000. This value does not fit into int64 but it does fit into uint64. So if we turn n into a uint64, we can compute uint64(9223372036854775808) % uint64(n), which is what the % expression does. By using uint64 for this calculation, we ensure that it doesn't overflow.
But: what is this calculation all about? Well, go back to our example with a generate() that produces values in [0..7]. When we wanted a number in [0..5], we had to discard both 6 and 7. That's what we're going for here: we want to find the value above which we should discard values.
If we were to take 8%6, we'd get 2. 8 is one bigger than the maximum that our 3-bit generate() would generate. 8%6 == 2 is the number of "high values" that we have to discard: 8-2 = 6 and we want to discard values that are 6 or more. Subtract 1 from this, and we get 7-2 = 5; we can accept numbers in this input range, from 0 to 5 inclusive.
So, this somewhat fancy calculation for setting max is just a way to find out what the maximum value we like is. Values that are greater than max need to be tossed out.
This particular calculation works nicely even if n is much less than our generator returns. For instance, suppose we had a four-bit generator, returning values in the [0..15] range, and we wanted a number in [0..2]. Our n is therefore 3 (to indicate that we want a number in [0..2]). We compute 16%3 to get 1. We then take 15 (one less than our maximum output value) - 1 to get 14 as our maximum acceptable value. That is, we would allow numbers in [0..14], but exclude 15.
With a 63-bit generator returning values in [0..9223372036854775807], and n==3, we would set max to 9223372036854775805. That's what we want: it throws out the two biasing values, 9223372036854775806 and 9223372036854775807.
The remainder of the code simply does that:
v := r.Int63()
for v > max {
v = r.Int63()
}
return v % n
We pick one Int63-range number. If it exceeds max, we pick another one and check again, until we pick one that is in the [0..max] range, inclusive of max.
Once we get a number that is in range, we use % n to shrink the range if needed. For instance, if the range is [0..2], we use v % 3. If v is (say) 14, 14%3 is 2. Our actual max is, again, 9223372036854775805, and whatever v is, between 0 and that, v%3 is between 0 and 2 and remains uniformly distributed, with no slight bias to 0 and 1 (9223372036854775806 would give us that one extra 0, and 9223372036854775807 would give us that one extra 1).
(Now repeat the above for int32 and 32 and 1<<32, for the Int31 function.)

The fastest random number Generator

I'm intending to implement a random number generator via Swift 3. I have three different methods for generating an integer (between 0 and 50000) ten thousand times non-stop.
Do these generators use the same math principles of generating a value or not?
What generator is less CPU and RAM intensive at runtime (having 10000 iterations)?
method A:
var generator: Int = random() % 50000
method B:
let generator = Int(arc4random_uniform(50000))
method C:
import GameKit
let number: [Int] = [0, 1, 2... 50000]
func generator() -> Int {
let random = GKRandomSource.sharedRandom().nextIntWithUpperBound(number.count)
return number[random]
}
All of these are pretty well documented, and most have published source code.
var generator: Int = random() % 50000
Well, first of all, this is modulo biased, so it certainly won't be equivalent to a proper uniform random number. The docs for random explain it:
The random() function uses a non-linear, additive feedback, random number generator, employing a default table of size 31 long integers. It returns successive pseudo-random numbers in the range
from 0 to (2**31)-1. The period of this random number generator is very large, approximately 16*((2**31)-1).
But you can look at the full implementation and documentation in Apple's source code for libc.
Contrast the documentation for arc4random_uniform (which does not have modulo bias):
These functions use a cryptographic pseudo-random number generator to generate high quality random bytes very quickly. One data pool is used for all consumers in a process, so that consumption
under program flow can act as additional stirring. The subsystem is re-seeded from the kernel random number subsystem on a regular basis, and also upon fork(2).
And the source code is also available. The important thing to note from arc4random_uniform is that it avoids modulo bias by adjusting the modulo correctly and then generating random numbers until it is in the correct range. In principle this could require generating an unlimited number of random values; in practice it is incredibly rare that it would need to generate more than one, and rare-to-the-point-of-unbelievable that it would generate more than that.
GKRandomSource.sharedRandom() is also well documented:
The system random source shares state with the arc4random family of C functions. Generating random numbers with this source modifies the outcome of future calls to those functions, and calling those functions modifies the sequence of random values generated by this source. As such, this source is neither deterministic nor independent—use it only for trivial gameplay mechanics that do not rely on those attributes.
For performance, you would expect random() to be fastest since it never seeds itself from the system entropy pool, and so it also will not reduce the entropy in the system (though arc4random only does this periodically, I believe around every 1.5MB or so of random bytes generated; not for every value). But as with all things performance, you must profile. Of course since random() does not reseed itself it is less random than arc4random, which is itself less random than the source of entropy in the system (/dev/random).
When in doubt, if you have GameplayKit available, use it. Apple selected the implementation of sharedRandom() based on what they think is going to work best in most cases. Otherwise use arc4random. But if you really need to minimize impact on the system for "pretty good" (but not cryptographic) random numbers, look at random. If you're willing to take "kinda random if you don't look at them too closely" numbers and have even less impact on the system, look at rand. And if you want almost no impact on the system (guaranteed O(1), inlineable), see XKCD's getRandomNumber().
Xorshift generators are among the fastest non-cryptographically-secure random number generators, requiring very small code and state.
an example of swift implementation of xorshift128+
func xorshift128plus(seed0 : UInt64, seed1 : UInt64) -> () -> UInt64 {
var s0 = seed0
var s1 = seed1
if s0 == 0 && s1 == 0 {
s1 = 1 // The state must be seeded so that it is not everywhere zero.
}
return {
var x = s0
let y = s1
s0 = y
x ^= x << 23
x ^= x >> 17
x ^= y
x ^= y >> 26
s1 = x
return s0 &+ s1
}
}
// create random generator, seed as needed!!
let random = xorshift128plus(seed0: 0, seed1: 0)
for _ in 0..<100 {
// and use it later
random()
}
to avoid modulo bias, you could use
func random_uniform(bound: UInt64)->UInt64 {
var u: UInt64 = 0
let b: UInt64 = (u &- bound) % bound
repeat {
u = random()
} while u < b
return u % bound
}
in your case
let r_number = random_uniform(bound: 5000) // r_number from interval 0..<5000

Encode number to a result

In my app I need to run a 5 digits number through an algorithm and return a number between the given interval, ie:
The function encode, gets 3 parameters, 5 digits initial number, interval lower limit and interval superior limit, for example:
int res=encode(12879,10,100) returns 83.
The function starts from 12879 and does something with the numbers and returns a number between 10 and 100. This mustn't be random, every time I pass the number 12879 to the encode function must always return the same number.
Any ideas?
Thanks,
Direz
One possible approach:
compute the range of your interval R = (100 - 10) + 1
compute a hash modulo R of the input H = hash(12879) % R
add the lower bound to the modular hash V = 10 + H
Here the thing though - you haven't defined any constraints or requirements on the "algorithm" that produces the result. If all you want is to map a value into a given range (without any knowledge of the distribution of the input, or how input values may cluster, etc), you could just as easily just take the range modulo of the input without hashing (as Foo Bah demonstrates).
If there are certain constraints, requirements, or distributions of the input or output of your encode method, then the approach may need to be quite different. However, you are the only one who knows what additional requirements you have.
You can do something simple like
encode(x,y,z) --> y + (x mod (z-y))
You don't have an upper limit for this function?
Assume it is 99999 because it is 5 digits. For your case, the simplest way is:
int encode (double N,double H,double L)
{
return (int)(((H - L) / (99999 - 10000)) * (N - 10000) + 10);
}

Resources