How do I generate a predictable random stream in hierarchy in c#? - random

I am making a procedural game with hierarchy.
So object A will have 10 children.
Each child will have 10 children and so on.
Now suppose I want to give each child a random colour, and a random position (assume these are given by integers).
Therefor let X be the "ID" of an object.
Let COLOUR and POSITION be enums of type PROPERTY.
Then I want to generate random integers:
int GenerateRandomInteger(PROPERTY P, int childNumber);
So I can use:
int N = parentObject.GenerateRandomInteger(COLOUR, 7);
For example.
Any ideas how to go about this?

In this case, GetRandomInteger should be implemented as a hash function. A hash function takes arbitrary data (here, the values of P and childNumber) and outputs a hash code. For the purposes of a game:
The hash function should have the avalanche property, meaning that every bit of the input affects every bit of the hash code.
Good hash functions here include MurmurHash3 and xxHash.
This answer also assumes that childNumber is unique throughout the application, rather than unique for a given parent.
The resulting hash code can then be used to generate a pseudorandom color and a position (for example, the first 24 bits of the hash code can be extracted and treated as a 8-bit-per-component RGB color). But further details on how this will work will depend on what programming language you're using and what ranges are acceptable for colors and positions, which you didn't specify in your question (there are several languages that use ints and enums, for example).

Related

Weka randomseed parameter for Randomized filter

There is a filter in WEKA in the preprocess tab named Randomized. The definition of this filter is Randomly shuffles the order of instances passed through it. The filter has a parameter called randomseed which is by default set as 42.
I found some definition of randomseed sunch as A random seed (or seed state, or just seed) is a number (or vector) used to initialize a pseudorandom number generator.
Seed function is used to save the state of a random function, so that it can generate same random numbers on multiple executions of the code on the same machine or on different machines (for a specific seed value). The seed value is the previous value number generated by the generator.
The number "42" was apparently chosen as a tribute to the "Hitch-hiker's Guide" books by Douglas Adams, as it was supposedly the answer to the great question of "Life, the universe, and everything" as calculated by a computer (named "Deep Thought") created specifically to solve it.
All those answers on the internet made me more confused.
I cannot understand what randomseed will do with random shuffle? Thus this means, 42 instances will take from the beginning of the instances and shuffle. Then again 42 instances will be taken and shuffled and the process will continue till the end?
The seed value simply initializes the java.util.Random object that generates the sequence of pseudo random numbers used for randomization.
A different seed value will result in a different sequence, therefore resulting in a different randomization of rows in your dataset.
The Randomize filter initializes such a Random object and then calls the randomize method of the weka.core.Instances object it currently holds. The code of the randomize method (at time of writing this is Weka 3.9.6) looks like this:
public void randomize(Random random) {
for (int j = numInstances() - 1; j > 0; j--) {
swap(j, random.nextInt(j + 1));
}
}
All options of option-handling classes in Weka have a default value. In case of the Randomize filter, the seed value has the default value of 42. It could have been anything, but someone was a fan of the Hitchhiker's Guide to the Galaxy and chose that value.

What abstract data type is this?

Is the following a common data type (i.e. does it have a name)?
Its unique characteristic is, unlike a regular Set, that it contains the "universe" on initialisation with O(C) memory overhead, and a max memory overhead of O(N/2) (which only occurs when you remove every-other element):
> s = new Structure(701)
s = Structure(0-700)
> s.remove(100)
s = Structure(0-99, 101-700)
> s.add(100)
s = Structure(0-700)
> s.remove(200)
s = Structure(0-199, 201-700)
> s.remove(202)
s = Structure(0-199, 201, 203-700)
> s.removeAll()
s = Structure()
Does something like this have a standard name?
I've used this many times in the past and seen it used in things like plane-sweep algorithms for polygon clipping.
Sometimes the abstract data type it represents is just a set, and the data structure is an optimization. I use this for representing the set of matching characters given by a regex expression like [^a-zA-z0-9.-], for example, and to perform intersection, union, and other operations on those sets.
This sort of data structure is implemented on top of some other ordered set or map structure, by simply storing the keys where membership in the set changes instead of the keys in the set itself. In all the other cases where I've seen this sort of thing done, the authors refer to that underlying structure instead of giving a name to the concept itself.
I like the idea of having a name for it, though, since as I said I've used it myself many times. Maybe I would call it an "in & out set" in honor of the hamburger chain I liked the best back when I ate hamburgers.
It's a Compressed Bit Set or Compressed Bitmap.
A Bit Set or Bitmap is a set specifically designed for storing Integers. Most languages offer standard implementations of these. They typically work by assigning a 1 to the Nth bit in an internal array of Integers where N is the number you're adding to the set. 0 indicates the value is not present. The memory usage for these types of Bit Sets is dictated by the largest number you store.
A Compressed Bit Set is one that compacts ranges of 0s and 1s.
In this case, the question demonstrates a type of compression called "run-length-encoding" (thank you #Ralf Kleberhoff), so it is specifically a Run-length Encoded Bitmap.
Common implementations of Compressed Bitmaps (from newest-to-oldest) are:
Roaring Bitmaps (only one to provide "good random access")
EWAH
WAH
Oracle BBC

Using a specific seed in the `RANDOM_NUMBER` algorithm

I'm looking to use a specific set of seeds for the intrinsic function RANDOM_NUMBER (a PRNG). What I've read so far is that the seed value can be set via calling RANDOM_SEED, specifically RANDOM_SEED(GET = array). My confusion is how to (if it's possible) set a specific value for the algorithm, for instance in the RAND, RANDU, or RANDM algorithms one can specify their own seed directly. I'm not sure how to set the seed, as the get function seems to take an array. If it takes an array, does it always pull the seed value from a specific index in the array?
Basically, is there a way to set a specific single seed value? If so, would someone be able to write-it out?
As a side note - I'm attempting to set my seed because allegedly one of the other PRNGs I mentioned only works well with "large odd numbers" according to my professor, so I decided that I may as well control this when comparing the PRNG's.
First, the RANDOM_SEED is only used for controlimg the seed of RANDOM_NUMBER(). If you use any other random number generator subroutine or function, it will not affect them at all or if yes then in some compiler specific way which you must find in the manual. But most probably it does not affect them.
Second, you should not care at all whether the seed array contains 1, 4 or 42 integers, it doesn't matter because the PRNG uses the bits from the whole array in some unspecified custom way. For example you may be generating 64 bit real numbers, but the seed array is made of 32 bit integers. You cannot simply say which integer from the seed array does what. You can view the whole seed array as one big integer number that is cut into smaller pieces if you want.
Regarding your professors advice, who knows what he meant, but the seed is probably set by some procedure of that particular generator, and not by the standard RANDOM_SEED, so you must read the documentation for that generator.
And how to use a specific seed in RANDOM_SEED? It was described on this site several times,jkust search for RANDOM_SEED in the top right search field, really. But it is simple, once you know the size of the array, size it to any non-trivial numbers (you need enough non-zero bits) you want and use put=. That's really all, just don't think about individual values in the array, the whole array is one piece of data together.

A good hashing function for a non-uniform sequence of uniformly distributed 4 bits values?

I have a very specific problem:
I have uniformly random values spread on a 15x50 grid and the sample I want to hash corresponds to a square of 5x5 cells centered around any possible grid position.
The number of samples can thus vary from 25 (away from borders, most cases) to 20, 15 (near a border) down to a minimum of 9 (in a corner).
So even though the cell values are random, the location introduces a deterministic variation in the sequence length.
The hash table size is a small number, typically between 50 and 20.
The function will operate on a large set of randomly generated grids (a few hundreds/thousands), and might be called a few thousands times per grid. The positions on the grid can be considered random.
I would like a function that could spread the 15x50 possible samples as evenly as possible.
I have tried the following pseudo-code:
int32 hash = 0;
int i = 0; // I guess i could take any initial value and even be left uninitialized, but fixing one makes the function deterministic
foreach (value in block)
{
hash ^= (value << (i%28))
i++
}
hash %= table_size
but the results, though not grossly imbalanced, do not seem very smooth to me. Maybe it's because the sample is too small, but the circumstances make it difficult to run the code on a bigger sample, and I would rather not have to write a complete test harness if some computer savvy has an answer ready for me :).
I am not sure pairing the values two by two and using a general purpose byte hashing strategy would be the best solution, especially since the number of values might be odd.
I have tought of using a 17th value to represent off-grid cells, but that seems to introduce a bias (the sequences from cells near a border will have a lot of "off grid" values).
I am not sure either what would be the best way to test the efficiency of various solutions (how many grids shall I generate to have an idea of the performances, for instance).
http://www.partow.net/programming/hashfunctions/
Here are few different hash function from experts on various fields. Functions are designed for 8bit values, but I am sure you can extend for your case. I dont know what to suggest, but I think that any of them should work better than your current idea.
Problem with current approach you propose is that values are cyclic in field 2^n and if you make mod 64 at the end for example you lost most values out and only last 3 values remains in final result.
Despite your scepticism I would just shove them through a standard hash function.
If they are well randomised (and relatively independent - you don't say) to begin with you probably don't need to do too much work. Fowler-Noll-Vo (FNV) is a good candidate in these circumstances.
FNV operates on a series of 8-bit input and your input is (logically) 4-bit.
I would start without even bothering to pack 'two by two' as you describe.
If you feel like trying that, just logically pad odd length series with the message length (reduced to a 4 bit value obviously).
I wouldn't expect that packing to improve the hash. It may save you a tiny number of cycles because it swaps a relatively expensive * with a << and a |.
Try both and report back!
Here are implementations of packed and 'normal' versions of FNV1a in C:
#include <inttypes.h>
static const uint32_t sFNVOffsetBasis=2166136261;
static const uint32_t sFNVPrime= 16777619;
const uint32_t FNV1aPacked4Bit(const uint8_t*const pBytes,const size_t pSize) {
uint32_t rHash=sFNVOffsetBasis;
for(size_t i=0;i<pSize;i+=2){
rHash=rHash^(pBytes[i]|(pBytes[i+1]<<4));
rHash=rHash*sFNVPrime;
}
if(pSize%2){//Length is odd. The loop missed the last element.
rHash=rHash^(pBytes[pSize-1]|((pSize&0x1E)<<3));
rHash=rHash*sFNVPrime;
}
return rHash;
}
const uint32_t FNV1a(const uint8_t*const pBytes,const size_t pSize) {
uint32_t rHash=sFNVOffsetBasis;
for(size_t i=0;i<pSize;++i){
rHash=(rHash^pBytes[i])*sFNVPrime;
}
return rHash;
}
NB: I've edited it to skip the first bit when adding in the length. Obviously the bottom bit of an odd length is 100% biased to 1. I don't know how length is distributed. It may be wiser to put it in at the start than the end.

A function where small changes in input always result in large changes in output

I would like an algorithm for a function that takes n integers and returns one integer. For small changes in the input, the resulting integer should vary greatly. Even though I've taken a number of courses in math, I have not used that knowledge very much and now I need some help...
An important property of this function should be that if it is used with coordinate pairs as input and the result is plotted (as a grayscale value for example) on an image, any repeating patterns should only be visible if the image is very big.
I have experimented with various algorithms for pseudo-random numbers with little success and finally it struck me that md5 almost meets my criteria, except that it is not for numbers (at least not from what I know). That resulted in something like this Python prototype (for n = 2, it could easily be changed to take a list of integers of course):
import hashlib
def uniqnum(x, y):
return int(hashlib.md5(str(x) + ',' + str(y)).hexdigest()[-6:], 16)
But obviously it feels wrong to go over strings when both input and output are integers. What would be a good replacement for this implementation (in pseudo-code, python, or whatever language)?
A "hash" is the solution created to solve exactly the problem you are describing. See wikipedia's article
Any hash function you use will be nice; hash functions tend to be judged based on these criteria:
The degree to which they prevent collisions (two separate inputs producing the same output) -- a by-product of this is the degree to which the function minimizes outputs that may never be reached from any input.
The uniformity the distribution of its outputs given a uniformly distributed set of inputs
The degree to which small changes in the input create large changes in the output.
(see perfect hash function)
Given how hard it is to create a hash function that maximizes all of these criteria, why not just use one of the most commonly used and relied-on existing hash functions there already are?
From what it seems, turning integers into strings almost seems like another layer of encryption! (which is good for your purposes, I'd assume)
However, your question asks for hash functions that deal specifically with numbers, so here we go.
Hash functions that work over the integers
If you want to borrow already-existing algorithms, you may want to dabble in pseudo-random number generators
One simple one is the middle square method:
Take a digit number
Square it
Chop off the digits and leave the middle digits with the same length as your original.
ie,
1111 => 01234321 => 2342
so, 1111 would be "hashed" to 2342, in the middle square method.
This way isn't that effective, but for a few number of hashes, this has very low collision rates, a uniform distribution, and great chaos-potential (small changes => big changes). But if you have many values, time to look for something else...
The grand-daddy of all feasibly efficient and simple random number generators is the (Mersenne Twister)[http://en.wikipedia.org/wiki/Mersenne_twister]. In fact, an implementation is probably out there for every programming language imaginable. Your hash "input" is something that will be called a "seed" in their terminology.
In conclusion
Nothing wrong with string-based hash functions
If you want to stick with the integers and be fancy, try using your number as a seed for a pseudo-random number generator.
Hashing fits your requirements perfectly. If you really don't want to use strings, find a Hash library that will take numbers or binary data. But using strings here looks OK to me.
Bob Jenkins' mix function is a classic choice, at when n=3.
As others point out, hash functions do exactly what you want. Hashes take bytes - not character strings - and return bytes, and converting between integers and bytes is, of course, simple. Here's an example python function that works on 32 bit integers, and outputs a 32 bit integer:
import hashlib
import struct
def intsha1(ints):
input = struct.pack('>%di' % len(ints), *ints)
output = hashlib.sha1(input).digest()
return struct.unpack('>i', output[:4])
It can, of course, be easily adapted to work with different length inputs and outputs.
Have a look at this, may be you can be inspired
Chaotic system
In chaotic dynamics, small changes vary results greatly.
A x-bit block cipher will take an number and convert it effectively to another number. You could combine (sum/mult?) your input numbers and cipher them, or iteratively encipher each number - similar to a CBC or chained mode. Google 'format preserving encyption'. It is possible to create a 32-bit block cipher (not widely 'available') and use this to create a 'hashed' output. Main difference between hash and encryption, is that hash is irreversible.

Resources