Random String generation from a given string, and inverse transform - algorithm

I am working on a requirement where a function f will use string s as a seed and generate n no of strings y0..n , I can easily do this, but I also want to do inverse ie, f-1(yi) of generated strings will give me back s.
y0 = f(s) # first time I call f(s) it gives me y0
y1 = f(s) # second time I call f(s) it gives me y1
...
yi = f(s) # ith time I call f(s) it gives me yi
and so on.
The inverse function,
s = f-1(yi)
How can find the functions f and f-1, the other constraint the character size cannot to be too large for these strings, say max 20-25 characters.
Any suggestions please ?

Ok, this will get too channel-coding specific if I do it in broadness, here, but:
These are mathematical concepts, so let's map strings to numbers and look at them algebraically:
Your 20-character string space, assuming we're just using the 128 common ASCII characters, has 27 * 20 elements. That's pretty many elements.
However, communication technology has a method called scrambling which is a reversible process of mingling the bits in a sequence in a way that spreads the per-bit energy over the whole sequence. That leads to pretty randomly looking bit streams. It's typically implemented using feedback shift registers.
It's possible to find a 2140 state LFSR that fulfills your scrambling needs, and you can interpret the output of a multiplicative scrambler as the next element in your sequence.
However, please be aware that your problem is a hard one, which I hope I've illustrated sufficiently -- getting something that has good random properties is a harsh thing, and I can't recommend implementing something like that yourself -- it's going to make problems as soon as you need to rely on mathematical properties of your pseudorandom string.

Related

Determinate State of Linear Congruential Generator from Results

I am curious on how someone would go about determining the state of a Linear Congruential Generator given its output.
X(n-1) = (aX(n) + c) mod p
Since the values returned are deterministic and the formula is well known, it should be possible to obtain state's value. What exactly is the best way to do this?
Edit:
I was at work when I posted this and this isn't work related, so I didn't spend much time and should have elaborated (much) further.
Assume this is used to generate non-integer values between 0 and 1, but its only visible output is true or false with a 50/50 spread. Assume the implementation is also known, so the values of a, c and p are known, but not X.
Would it be possible, with an finite amount of output, to determine the value of X?
Well, in the simplest case, the output IS the state -- the output is the sequence X0, X1, X2, ... each element of which is the internal state at one step.
More commonly, the LCRNG will be divided to generate uniform numbers in the range [0,k) rather than [0,p) (the values output will be floor(kXn/p),) so will only tell you the upper bits of the internal state. In this case, each output value will give you a range of possible value for the state at that step. By correlating multiple consecutive values, you can narrow down the range.

Bin-packing (or knapsack?) problem

I have a collection of 43 to 50 numbers ranging from 0.133 to 0.005 (but mostly on the small side). I would like to find, if possible, all combinations that have a sum between L and R, which are very close together.*
The brute-force method takes 243 to 250 steps, which isn't feasible. What's a good method to use here?
Edit: The combinations will be used in a calculation and discarded. (If you're writing code, you can assume they're simply output; I'll modify as needed.) The number of combinations will presumably be far too large to hold in memory.
* L = 0.5877866649021190081897311406, R = 0.5918521703507438353981412820.
The basic idea is to convert it to an integer knapsack problem (which is easy).
Choose a small real number e and round numbers in your original problem to ones representable as k*e with integer k. The smaller e, the larger the integers will be (efficiency tradeoff) but the solution of the modified problem will be closer to your original one. An e=d/(4*43) where d is the width of your target interval should be small enough.
If the modified problem has an exact solution summing to the middle (rounded to e) of your target interval, then the original problem has one somewhere within the interval.
You haven't given us enough information. But it sounds like you are in trouble if you actually want to OUTPUT every possible combination. For example, consistent with what you told us are that every number is ~.027. If this is the case, then every collection of half of the elements with satisfy your criterion. But there are 43 Choose 21 such sets, which means you have to output at least 1052049481860 sets. (too many to be feasible)
Certainly the running time will be no better than the length of the required output.
Actually, there is a quicker way around this:
(python)
sums_possible = [(0, [])]
# sums_possible is an array of tuples like this: (number, numbers_that_yield_this_sum_array)
for number in numbers:
sums_possible_for_this_number = []
for sum in sums_possible:
sums_possible_for_this_number.insert((number + sum[0], sum[1] + [number]))
sums_possible = sums_possible + sums_possible_for_this_number
results = [sum[1] for sum in sums_possible if sum[0]>=L and sum[1]<=R]
Also, Aaron is right, so this may or may not be feasible for you

Is there "good" PRNG generating values without hidden state?

I need some good pseudo random number generator that can be computed like a pure function from its previous output without any state hiding. Under "good" I mean:
I must be able to parametrize generator in such way that running it for 2^n iterations with any parameters (or with some large subset of them) should cover all or almost all values between 0 and 2^n - 1, where n is the number of bits in output value.
Combined generator output of n + p bits must cover all or almost all values between 0 and 2^(n + p) - 1 if I run it for 2^n iterations for every possible combination of its parameters, where p is the number of bits in parameters.
For example, LCG can be computed like a pure function and it can meet first condition, but it can not meet second one. Say, we have 32-bit LCG, m = 2^32 and it is constant, our p = 64 (two 32-bit parameters a and c), n + p = 96, so we must peek data by three ints from output to meet second condition. Unfortunately, condition can not be meet because of strictly alternating sequence of odd and even ints in output. To overcome this, hidden state must be introduced, but that makes function not pure and breaks first condition (long hidden period).
EDIT: Strictly speaking, I want family of functions parametrized by p bits and with full state of n bits, each generating all possible binary strings of p + n bits in unique "randomish" way, not just continuously incrementing (p + n)-bit int. Parametrization required to select that unique way.
Am I wanting too much?
You can use any block cipher, with a fixed key. To generate the next number, decrypt the current one, increment it, and re-encrypt it. Because block ciphers are 1:1, they'll necessarily iterate through every number in the output domain before repeating.
Try LFSR
All you need is list of primitive polynomials.
Period of generating finite field this way, generates field of size 2^n-1. But you can generalise this procedure to generate anything whit period of k^n-1.
I have not seen this implemented, but all you have to implement is shifting numbers by small number s>n where gcd(s,2^n-1) == 1. gcd stands for greatest common divisor

How do you seed a PRNG with two seeds?

For a game that I'm making, where solar systems have an x and y coordinates, I'd like to use the coordinates to randomly generate the features for that solar system. The easiest way to do this seems to seed a random number generator with two seeds, the x and y coordinates. Is there anyway to get one reliable seed from the two seeds, or is there a good PRNG that takes two seeds and produces long periods?
EDIT: I'm aware of binary operations between the two numbers, but I'm trying to find the method that will lead to the least number of collisions? Addition and multiplication will easily result in collisions. But what about XOR?
Why not just combine the numbers in a meaningful way to generate your seed. For example, you could add them, which could be unique enough, or perhaps stack them using a little multiplication, for example:
seed = (x << 32) + y
seed1 ^ seed2
(where ^ is the bitwise XOR operator)
A simple Fibonacci PRNG uses 2 seeds
One of which should be odd. This generator
Uses a modulus which is a power of 10.
The period is long and invariable being
1.5 times the modulus; thus for modulus
1000000 or 10^6 the period is 1,500,000.
The simple pseudocode is:
Input "Enter power for 10^n modulus";m
Mod& = 10 ^ m
Input "Enter # of iterations"; n
Input "Enter seed #1"; a
Input "Enter seed #2"; b
Loop = 1
For loop = 1 to n
C = a + b
If c > m then c = c - m
A = b
B = c
Next
This generator is very fast and gives
An excellent uniform distribution.
Hope this helps.
why not use some kind of super simple fibonacci arithmetic or something like it to produce coordinates directly in base 10. Use the two starting numbers as the seeds. It won't produce random numbers suitable for monte carlo or anything like that, but they should be all right for a game. I'm not a programer or a mathmatician and have never tried to code anything so I couldn't do it for you.....
edit - something like f1 = some seed then f2 = some seed and G = (sqrt(5) + 1) / 2....
then some kind of loop. Xn = Xn-1 + Xn-2 mod(G) mod(1) (should produce a decimal between 0 and 1) and then multiply by what ever and take the least significant digits
and perhaps to prevent decay for as long as the numbers need to be produced...
an initial reseeding point at which f1 and f2 will be reseeded based on the generators own output, which will prevent the sequence of numbers being able to be described by a closed expression so...
if counter = initial reseeding point f1 = Xn and f2 = Xn - something. and... reseeding point is set to ceiling Xn * some multiplier.
so it's period should end when identical values for Xn and Xn - something are re-fed into f1 and f2, which shouldn't happen for at least what ever bit length you are using for the numbers.
.... I mean, that's my best guess...
Is there a reason you want to use the co-ordinates? For example, do you always want a system generated at the same coordinate to always be identical to any other system generated at that particular co-ordinate?
I would suggest using the more classical method of just seeding with the current time and using the results of that to continue generating your pseudo-randomness.
If you're adamant about using the coordinates, I would suggest concatenation (As I believe someone else suggested). At least then you're guaranteed to avoid collisions, assuming that you don't have two systems at the same co-ords.
I use one of George Marsaglia's PRNGs:
http://www.math.uni-bielefeld.de/~sillke/ALGORITHMS/random/marsaglia-c
It explicitly relies on two seeds so might just what you are looking for.

Entropy repacking

I have been tossing around a conceptual idea for a machine (as in a Turing machine) and I'm wondering if any work has been done on this or related topics.
The idea is a machine that takes an entropy stream and gives out random symbols in any range without losing any entropy.
I'll grand that is a far from rigorous description so I'll give an example: Say I have a generator of random symbols in the range of 1 to n and I want to be able to ask for a symbols in any given range, first 1 to 12 and then 1 to 1234. (To keep it practicable I'll only consider deterministic machines where, given the same input stream and requests, it will always give the same output.) One necessary constraint is that the output contain at least as much entropy as the input. However, the constraint I'm most interested in is that the machine only reads in as much entropy as it spits out.
E.g. if asked for tokens in the range of 1 to S1, S2, S3, ... Sm it would only consume ceiling(sum(i = 1 to m, log(Si))/log(n)) inputs tokens.
This question asks about how to do this conversion while satisfying the first constraint but does very badly on the second.
Okay, I'm still not sure that I'm following what you want. It sounds like you want a function
f: I → O
where the inputs are a strongly random (uniform distribution etc) sequence of symbols on an alphabet I={1..n}. (So a series of random natural numbers ≤ n.) The outputs are another sequence on O={1..m} and you want that sequence to have as much entropy as the inputs.
Okay, if I've got this right, first off, if m < n, you can't. If m < n then lg m < lg n, so the entropy of the set of output symbols is smaller.
If m ≥ n, then you can do it trivially by just selecting the ith element of {1..m}. Entropy will be the same, since the number of possible output symbols is the same. They aren't going to be "random" in the sense of being uniformly distributed over the whole set {1..m}, though, because necessarily (pigeonhole principle) some symbols won't be selected at all.
If, on the other hand, you'd be satisfied with having a random sequence on {1..m}, then you can do it by selecting an appropriate pseudorandom number generator using your input from the random source as a seed.
My current pass at it:
By adding the following restriction: you know in advance what the sequence of ranges is {S1, S2, S3, ..., Sn}, than using base translation with a non-constant base might work:
Find Sp = S1 * S2 * S3 * ... * Sn
Extract m=cealing(log(Sp)/log(n)) terms from the input {R1, R2, R3, ..., Rm}
Find X = R1 + R2*n + R3*n^2 + ... + Rm*n^(m-1)
Reform X as O1 + S1*O2 + S1*S2*O3 + ... Sn*On + x where 1 <= Oi <= Si
This might be reformable into a solution that works for one value at a time by pushing x back into the input stream. However I can't convince my self that even the known outputs range form is sound so...

Resources