31-bit Bijective (Perfect) Hash algorithm - algorithm

What I need
I need an algorithm that produces a bijective output. I have a 31-bit input and need a pseudo-random 31-bit output.
What I have considered
CRCs are bijective within their bit-width.
I have looked on Google and can find the polynomials for this, but not the tables or algorithm.
Could anyone point me in the right direction?
I need a CRC-31 algorithm using polynomial say 0x737e312b, or any bijective function that will do what I need.
NOTE
I found the following code, but I unfortunately don't have the tools to compile and run it.

For any hash function hash, you can do:
function bijectiveHash31(int val) {
val &= 0x7FFFFFFF; //make sure it's 31 bits
for (int i=0; i<5; ++i) {
// the high bits affect the low bits
val ^= hash(val>>15) & 32767;
// rotate bits
val = ((val&32767)<<16) | ((val>>15)&65535);
}
return val;
}
This is a Feistel structure, which forms the basis of many ciphers: https://en.wikipedia.org/wiki/Feistel_cipher
If you need it to be fast and you don't need it to be super good, then this works fine:
function bijectiveHash31(int val) {
val = ((val*RANDOM_ODD_NUMBER) + RANDOM_NUMBER) & 0x7FFFFFFF;
val ^= (val>>15);
val ^= (val>>8);
return val;
}
In both of these cases, it's not too difficult to figure out how you could undo each elementary operation, which shows that the whole hash is bijective. If you need help establishing that for the multiplication, see https://en.wikipedia.org/wiki/Modular_multiplicative_inverse

Related

Eigen - return type of .cwiseProduct?

I am writing a function in RcppEigen for weighted covariances. In one of the steps I want to take column i and column j of a matrix, X, and compute the cwiseProduct, which should return some kind of vector. The output of cwiseProduct will go into an intermediate variable which can be reused many times. From the docs it seems cwiseProduct returns a CwiseBinaryOp, which itself takes two types. My cwiseProduct operates on two column vectors, so I thought the correct return type should be Eigen::CwiseBinaryOp<Eigen::ColXpr, Eigen::ColXpr>, but I get the error no member named ColXpr in namespace Eigen
#include <RcppEigen.h>
// [[Rcpp::depends(RcppEigen)]]
Rcpp::List Crossprod_sparse(Eigen::MappedSparseMatrix<double> X, Eigen::Map<Eigen::MatrixXd> W) {
int K = W.cols();
int p = X.cols();
Rcpp::List crossprods(W.cols());
for (int i = 0; i < p; i++) {
for (int j = i; j < p; j++) {
Eigen::CwiseBinaryOp<Eigen::ColXpr, Eigen::ColXpr> prod = X.col(i).cwiseProduct(X.col(j));
for (int k = 0; k < K; k++) {
//double out = prod.dot(W.col(k));
}
}
}
return crossprods;
}
I have also tried saving into a SparseVector
Eigen::SparseVector<double> prod = X.col(i).cwiseProduct(X.col(j));
as well as computing, but not saving at all
X.col(i).cwiseProduct(X.col(j));
If I don't save the product at all, the functions returns very quickly, hinting that cwiseProduct is not an expensive function. When I save it into a SparseVector, the function is extremely slow, making me think that SparseVector is not the right return type and Eigen is doing extra work to get it into that type.
Recall that Eigen relies on expression templates, so if you don't assign an expression then this expression is essentially a no-op. In your case, assigning it to a SparseVector is the right thing to do. Regarding speed, make sure to compile with compiler optimizations ON (like -O3).
Nonetheless, I believe there is a faster way to write your overall computations. For instance, are you sure that all X.col(i).cwiseProduct(X.col(j)) are non empty? If not, then the second loop should be rewritten to iterate over the sparse set of overlapping columns only. Loops could also be interchanged to leverage efficient matrix products.

Looking for a replayable / somewhat stateless PRNG algorithm

I'm looking for a pseudo-random number generator that is "replayable" and "stateless". Let me elaborate: I need to be able to re-fetch a pseudo-random number based on a parameter to the random function. For example (C-style pseudocode):
int x1 = random(1);
int x2 = random(2);
// and so on with lots of random() calls in between
int new_x1 = random(1);
// now new_x1 is like a "replay" of x1, so x1 == new_x1
The type of arguments doesn't matter (I can typecast whatever is needed), the return value doesn't have to be int; ultimately I'll need 8-bit values.
The question is: what's a good PRNG algorithm that satisfies the requirement that the next pseudo-random value is controlled by a parameter, and not by its internal state which is updated upon each invocation? I don't what to use a crummy solution like the following:
int random(int input) {
srand(input);
return rand();
}
This would have to initialize the PRNG upon every invocation, which seems costly. (I am illustrating this point using the standard srand() / rand(), I know there are better algorithms out there, like Mersenne Twister, but the idea is still the same.)
One approach that might work here would be to use a block cipher like AES or triple-DES. Your pseudorandom generator could then be
int pseudorandomValue(int input) {
return encryptUsingAES(input);
}
This is stateless, pseudorandom (since the outputs of AES should be statistically indistinguishable from random), and stateless.
Hope this helps!
You may use the Xorshift[1,2] based PRNG. This PRNG uses the previous random number to generate the next. The implementation is very efficient, as compared to AES.
For 32-bit implementation:
uint32_t next_rand(uint32_t prev)
{
prev ^= prev << 13;
prev ^= prev >> 17;
prev ^= prev << 5;
return prev;
}
For 64-bit implementation:
uint64_t next_rand(uint64_t prev)
{
prev ^= prev << 21;
prev ^= prev >> 35;
prev ^= prev << 4;
return prev;
}
The random number sequence is "replayable", stateless, and depends on only the initial value, which is the seed.
References:
Wiki:Wiki.
A paper with detailed math: paper.

Simple random number generator that can generate nth number in series in O(1) time

I do not intend to use this for security purposes or statistical analysis. I need to create a simple random number generator for use in my computer graphics application. I don't want to use the term "random number generator", since people think in very strict terms about it, but I can't think of any other word to describe it.
it has to be fast.
it must be repeatable, given a particular seed.
Eg: If seed = x, then the series a,b,c,d,e,f..... should happen every time I use the seed x.
Most importantly, I need to be able to compute the nth term in the series in constant time.
It seems, that I cannot achieve this with rand_r or srand(), since these need are state dependent, and I may need to compute the nth in some unknown order.
I've looked at Linear Feedback Shift registers, but these are state dependent too.
So far I have this:
int rand = (n * prime1 + seed) % prime2
n = used to indicate the index of the term in the sequence. Eg: For
first term, n ==1
prime1 and prime2 are prime numbers where
prime1 > prime2
seed = some number which allows one to use the same function to
produce a different series depending on the seed, but the same series
for a given seed.
I can't tell how good or bad this is, since I haven't used it enough, but it would be great if people with more experience in this can point out the problems with this, or help me improve it..
EDIT - I don't care if it is predictable. I'm just trying to creating some randomness in my computer graphics.
Use a cryptographic block cipher in CTR mode. The Nth output is just encrypt(N). Not only does this give you the desired properties (O(1) computation of the Nth output); it also has strong non-predictability properties.
I stumbled on this a while back, looking for a solution for the same problem. Recently, I figured out how to do it in low-constant O(log(n)) time. While this doesn't quite match the O(1) requested by the author, It may be fast enough (a sample run, compiled with -O3, achieved performance of 1 billion arbitrary index random numbers, with n varying between 1 and 2^48, in 55.7s -- just shy of 18M numbers/s).
First, the theory behind the solution:
A common type of RNGs are Linear Congruential Generators, basically, they work as follows:
random(n) = (m*random(n-1) + b) mod p
Where m and b, and p are constants (see a reference on LCGs for how they are chosen). From this, we can devise the following using a bit of modular arithmetic:
random(0) = seed mod p
random(1) = m*seed + b mod p
random(2) = m^2*seed + m*b + b mod p
...
random(n) = m^n*seed + b*Sum_{i = 0 to n - 1} m^i mod p
= m^n*seed + b*(m^n - 1)/(m - 1) mod p
Computing the above can be a problem, since the numbers will quickly exceed numeric limits. The solution for the generic case is to compute m^n in modulo with p*(m - 1), however, if we take b = 0 (a sub-case of LCGs sometimes called Multiplicative congruential Generators), we have a much simpler solution, and can do our computations in modulo p only.
In the following, I use the constant parameters used by RANF (developed by CRAY), where p = 2^48 and g = 44485709377909. The fact that p is a power of 2 reduces the number of operations required (as expected):
#include <cassert>
#include <stdint.h>
#include <cstdlib>
class RANF{
// MCG constants and state data
static const uint64_t m = 44485709377909ULL;
static const uint64_t n = 0x0000010000000000ULL; // 2^48
static const uint64_t randMax = n - 1;
const uint64_t seed;
uint64_t state;
public:
// Constructors, which define the seed
RANF(uint64_t seed) : seed(seed), state(seed) {
assert(seed > 0 && "A seed of 0 breaks the LCG!");
}
// Gets the next random number in the sequence
inline uint64_t getNext(){
state *= m;
return state & randMax;
}
// Sets the MCG to a specific index
inline void setPosition(size_t index){
state = seed;
uint64_t mPower = m;
for (uint64_t b = 1; index; b <<= 1){
if (index & b){
state *= mPower;
index ^= b;
}
mPower *= mPower;
}
}
};
#include <cstdio>
void example(){
RANF R(1);
// Gets the number through random-access -- O(log(n))
R.setPosition(12345); // Goes to the nth random number
printf("fast nth number = %lu\n", R.getNext());
// Gets the number through standard, sequential access -- O(n)
R.setPosition(0);
for(size_t i = 0; i < 12345; i++) R.getNext();
printf("slow nth number = %lu\n", R.getNext());
}
While I presume the author has moved on by now, hopefully this will be of use to someone else.
If you're really concerned about runtime performance, the above can be made about 10x faster with lookup tables, at the cost of compilation time and binary size (it also is O(1) w.r.t the desired random index, as requested by OP)
In the version below, I used c++14 constexpr to generate the lookup tables at compile time, and got to 176M arbitrary index random numbers per second (doing this did however add about 12s of extra compilation time, and a 1.5MB increase in binary size -- the added time may be mitigated if partial recompilation is used).
class RANF{
// MCG constants and state data
static const uint64_t m = 44485709377909ULL;
static const uint64_t n = 0x0000010000000000ULL; // 2^48
static const uint64_t randMax = n - 1;
const uint64_t seed;
uint64_t state;
// Lookup table
struct lookup_t{
uint64_t v[3][65536];
constexpr lookup_t() : v() {
uint64_t mi = RANF::m;
for (size_t i = 0; i < 3; i++){
v[i][0] = 1;
uint64_t val = mi;
for (uint16_t j = 0x0001; j; j++){
v[i][j] = val;
val *= mi;
}
mi = val;
}
}
};
friend struct lookup_t;
public:
// Constructors, which define the seed
RANF(uint64_t seed) : seed(seed), state(seed) {
assert(seed > 0 && "A seed of 0 breaks the LCG!");
}
// Gets the next random number in the sequence
inline uint64_t getNext(){
state *= m;
return state & randMax;
}
// Sets the MCG to a specific index
// Note: idx.u16 indices need to be adapted for big-endian machines!
inline void setPosition(size_t index){
static constexpr auto lookup = lookup_t();
union { uint16_t u16[4]; uint64_t u64; } idx;
idx.u64 = index;
state = seed * lookup.v[0][idx.u16[0]] * lookup.v[1][idx.u16[1]] * lookup.v[2][idx.u16[2]];
}
};
Basically, what this does is splits the computation of, for example, m^0xAAAABBBBCCCC mod p, into (m^0xAAAA00000000 mod p)*(m^0xBBBB0000 mod p)*(m^CCCC mod p) mod p, and then precomputes tables for each of the values in the 0x0000 - 0xFFFF range that could fill AAAA, BBBB or CCCC.
RNG in a normal sense, have the sequence pattern like f(n) = S(f(n-1))
They also lost precision at some point (like % mod), due to computing convenience, therefore it is not possible to expand the sequence to a function like X(n) = f(n) = trivial function with n only.
This mean at best you have O(n) with that.
To target for O(1) you therefore need to abandon the idea of f(n) = S(f(n-1)), and designate a trivial formula directly so that the N'th number can be calculated directly without knowing (N-1)'th; this also render the seed meaningless.
So, you end up have a simple algebra function and not a sequence. For example:
int my_rand(int n) { return 42; } // Don't laugh!
int my_rand(int n) { 3*n*n + 2*n + 7; }
If you want to put more constraint to the generated pattern (like distribution), it become a complex maths problem.
However, for your original goal, if what you want is constant speed to get pseudo-random numbers, I suggest to pre-generate it with traditional RNG and access with lookup table.
EDIT: I noticed you have concern with a table size for a lot of numbers, however you may introduce some hybrid model, like a table of N entries, and do f(k) = g( tbl[k%n], k), which at least provide good distribution across N continue sequence.
This demonstrates an PRNG implemented as a hashed counter. This might appear to duplicate R.'s suggestion (using a block cipher in CTR mode as a stream cipher), but for this, I avoided using cryptographically secure primitives: for speed of execution and because security wasn't a desired feature.
If we were trying to create a secure stream cipher with your requirement that any emitted sequence be trivially repeatable, given knowledge of its index...
...then we could choose a secure hash algorithm (like SHA256) and a counter with a lot of bits (maybe 2048 -> sequence repeats every 2^2048 generated numbers without reseeding).
HOWEVER, the version I present here uses Bob Jenkins' famous hash function (simple and fast, but not secure) along with a 64-bit counter (which is as big as integers can get on my system, without needing custom incrementing code).
Code in main demonstrates that knowledge of the RNG's counter (seed) after initialization allows a PRNG sequence to be repeated, as long as we know how many values were generated leading up to the repetition point.
Actually, if you know the counter's value at any point in the output sequence, you will be able to retrieve all values generated previous to that point, AND all values which will be generated afterward. This only involves adding or subtracting ordinal differences to/from a reference counter value associated with a known point in the output sequence.
It should be pretty easy to adapt this class for use as a testing framework -- you could plug in other hash functions and change the counter's size to see what kind of impact there is on speed as well as the distribution of generated values (the only uniformity analysis I did was to look for patterns in the screenfuls of hexadecimal numbers printed by main()).
#include <iostream>
#include <iomanip>
#include <ctime>
using namespace std;
class CHashedCounterRng {
static unsigned JenkinsHash(const void *input, unsigned len) {
unsigned hash = 0;
for(unsigned i=0; i<len; ++i) {
hash += static_cast<const unsigned char*>(input)[i];
hash += hash << 10;
hash ^= hash >> 6;
}
hash += hash << 3;
hash ^= hash >> 11;
hash += hash << 15;
return hash;
}
unsigned long long m_counter;
void IncrementCounter() { ++m_counter; }
public:
unsigned long long GetSeed() const {
return m_counter;
}
void SetSeed(unsigned long long new_seed) {
m_counter = new_seed;
}
unsigned int operator ()() {
// the next random number is generated here
const auto r = JenkinsHash(&m_counter, sizeof(m_counter));
IncrementCounter();
return r;
}
// the default coontructor uses time()
// to seed the counter
CHashedCounterRng() : m_counter(time(0)) {}
// you can supply a predetermined seed here,
// or after construction with SetSeed(seed)
CHashedCounterRng(unsigned long long seed) : m_counter(seed) {}
};
int main() {
CHashedCounterRng rng;
// time()'s high bits change very slowly, so look at low digits
// if you want to verify that the seed is different between runs
const auto stored_counter = rng.GetSeed();
cout << "initial seed: " << stored_counter << endl;
for(int i=0; i<20; ++i) {
for(int j=0; j<8; ++j) {
const unsigned x = rng();
cout << setfill('0') << setw(8) << hex << x << ' ';
}
cout << endl;
}
cout << endl;
cout << "The last line again:" << endl;
rng.SetSeed(stored_counter + 19 * 8);
for(int j=0; j<8; ++j) {
const unsigned x = rng();
cout << setfill('0') << setw(8) << hex << x << ' ';
}
cout << endl << endl;
return 0;
}

Number which occurs only once in the array [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
Finding a single number in a list
Given an array of numbers, except for one number all the others, occur
twice. What should be the algorithm to find that number which occurs only once in the
array?
Example
a[1..n] = [1,2,3,4,3,1,2]
should return 4
Let the number which occurs only once in the array be x
x <- a[1]
for i <- 2 to n
x <- x ^ a[i]
return x
Since a ^ a = 0 and a ^ 0 = a
Numbers which occur in pair cancel out and the result gets stored in x
Working code in C++
#include <iostream>
template<typename T, size_t N>
size_t size(T(&a)[N])
{
return N;
}
int main()
{
int a [] = {1,2,3,4,3,1,2};
int x = a[0];
for (size_t i = 1; i< size(a) ; ++i)
{
x = x ^ a[i];
}
std::cout << x;
}
Create new int i = 0
XOR each item with i
After all iterations there will be expected number in i
If you have quantities which cannot be reasonably xored (Big Integers or numbers represented as Strings, for example), an alternate approach which is also O(n) time, (but O(n) space rather than O(1) space) would be to simply use a hash table. The algorithm looks like:
Create a hash table of the same size as the list
For every item in the list:
If item is a key in hash table
then remove item from hash table
else add item to hash table with nominal value
At the end, there should be exactly one item in the hash table
I would do, C or C++ code, but neither of them have hash tables built in. (Don't ask me why C++ doesn't have a hash table in the STL, but does have a hash map based on a red-black tree, because I have no idea what they were thinking.) And, unfortunately, I don't have a C# compiler handy to test for syntax errors, so I'm giving you Java code. It's pretty similar, though.
import java.util.Hashtable;
import java.util.List;
class FindUnique {
public static <T> T findUnique(List<T> list) {
Hashtable<T,Character> ht = new Hashtable<T,Character>(list.size());
for (T item : list) {
if (ht.containsKey(item)) {
ht.remove(item);
} else {
ht.put(item,'x');
}
}
return ht.keys().nextElement();
}
}
Well i only know of the Brute force algo and it is to traverse whole array and check
Code will be like (in C#):
k=0;
for(int i=0 ; i < array.Length ; i++)
{
k ^= array[i];
}
return k;
zerkms' answer in C++
int a[] = { 1,2,3,4,3,1,2 };
int i = std::accumulate(a, a + 7, 0, std::bit_xor<int>());
You could sort the array and then find the first element that doesn't have a pair. That would require several loops for sorting and a loop for finding the single element.
But a simplier method would be setting the double keys to zero or a value that is not possible in the current format. Depends on the programming language, as well, as you cannot change key types in c++ unlike c#.

calculating the number of bits using K&R method with infinite memory

I got answer for the question, counting number of sets bits from here.
How to count the number of set bits in a 32-bit integer?
long count_bits(long n) {
unsigned int c; // c accumulates the total bits set in v
for (c = 0; n; c++)
n &= n - 1; // clear the least significant bit set
return c;
}
It is simple to understand also. And found the best answer as Brian Kernighans method, posted by hoyhoy... and he adds the following at the end.
Note that this is an question used during interviews. The interviewer will add the caveat that you have "infinite memory". In that case, you basically create an array of size 232 and fill in the bit counts for the numbers at each location. Then, this function becomes O(1).
Can somebody explain how to do this ? If i have infinite memory ...
The fastest way I have ever seen to populate such an array is ...
array[0] = 0;
for (i = 1; i < NELEMENTS; i++) {
array[i] = array[i >> 1] + (i & 1);
}
Then to count the number of set bits in a given number (provided the given number is less than NELEMENTS) ...
numSetBits = array[givenNumber];
If your memory is not finite, I often see NELEMENTS set to 256 (for one byte's worth) and add the number of set bits in each byte in your integer.
int counts[MAX_LONG];
void init() {
for (int i= 0; i < MAX_LONG; i++)
{
counts[i] = count_bits[i]; // as given
}
}
int count_bits_o1(long number)
{
return counts[number];
}
You can probably pre-populate the array more wiseley, i.e. fill with zeros, then every second index add one, then every fourth index add 1, then every eighth index add 1 etc, which might be a bit faster, although I doubt it...
Also, you might account for unsigned values.

Resources