I don't mean a function that generates random numbers, but an algorithm to generate a random function
"High dimension" means the function is multi-variable, e.g. a 100-dim function has 100 different variables.
Let's say the domain is [0,1], we need to generate a function f:[0,1]^n->[0,1]. This function is chosen from a certain class of functions, so that the probability of choosing any of these functions is the same.
(This class of functions can be either all continuous, or K-order derivative, whichever is convenient for the algorithm.)
Since the functions on a closed interval domain are uncountable infinite, we only require the algorithm to be pseudo-random.
Is there a polynomial time algorithm to solve this problem?
I just want to add a possible algorithm to the question(but not feasible due to its exponential time complexity). The algorithm was proposed by the friend who actually brought up this question in the first place:
The algorithm can be simply described as following. First, we assume the dimension d = 1 for example. Consider smooth functions on the interval I = [a; b]. First, we split the domain [a; b] into N small intervals. For each interval Ii, we generate a random number fi living in some specific distributions (Gaussian or uniform distribution). Finally, we do the interpolation of
series (ai; fi), where ai is a characteristic point of Ii (eg, we can choose ai as the middle point of Ii). After interpolation, we gain a smooth curve, which can be regarded as a one dimensional random function construction living in the function space Cm[a; b] (where m depends on the interpolation algorithm we choose).
This is just to say that the algorithm does not need to be that formal and rigorous, but simply to provide something that works.
So if i get it right you need function returning scalar from vector;
The easiest way I see is the use of dot product
for example let n be the dimensionality you need
so create random vector a[n] containing random coefficients in range <0,1>
and the sum of all coefficients is 1
create float a[n]
feed it with positive random numbers (no zeros)
compute the sum of a[i]
divide a[n] by this sum
now the function y=f(x[n]) is simply
y=dot(a[n],x[n])=a[0]*x[0]+a[1]*x[1]+...+a[n-1]*x[n-1]
if I didn't miss something the target range should be <0,1>
if x==(0,0,0,..0) then y=0;
if x==(1,1,1,..1) then y=1;
If you need something more complex use higher order of polynomial
something like y=dot(a0[n],x[n])*dot(a1[n],x[n]^2)*dot(a2[n],x[n]^3)...
where x[n]^2 means (x[0]*x[0],x[1]*x[1],...)
Booth approaches results in function with the same "direction"
if any x[i] rises then y rises too
if you want to change that then you have to allow also negative values for a[]
but to make that work you need to add some offset to y shifting from negative values ...
and the a[] normalization process will be a bit more complex
because you need to seek the min,max values ...
easier option is to add random flag vector m[n] to process
m[i] will flag if 1-x[i] should be used instead of x[i]
this way all above stays as is ...
you can create more types of mapping to make it even more vaiable
This might not only be hard, but impossible if you actually want to be able to generate every continuous function.
For the one-dimensional case you might be able to create a useful approximation by looking into the Faber-Schauder-System (also see wiki). This gives you a Schauder-basis for continuous functions on an interval. This kind of basis only covers the whole vectorspace if you include infinite linear combinations of basisvectors. Thus you can create some random functions by building random linear combinations from this basis, but in general you won't be able to create functions that are actually represented by an infinite amount of basisvectors this way.
Edit in response to your update:
It seems like choosing a random polynomial function of order K (for the class of K-times differentiable functions) might be sufficient for you since any of these functions can be approximated (around a given point) by one of those (see taylor's theorem). Choosing a random polynomial function is easy, since you can just pick K random real numbers as coefficients for your polynom. (Note that this will for example not return functions similar to abs(x))
Many randomized algorithms and data structures (such as the Count-Min Sketch) require hash functions with the pairwise independence property. Intuitively, this means that the probability of a hash collision with a specific element is small, even if the output of the hash function for that element is known.
I have found many descriptions of pairwise independent hash functions for fixed-length bitvectors based on random linear functions. However, I have not yet seen any examples of pairwise independent hash functions for strings.
Are there any families of pairwise independent hash functions for strings?
I'm pretty sure they exist, but there's a bit of measure-theoretic subtlety to your question. You might be better off asking on mathoverflow. I'm very rusty with this stuff, but I think I can show that, even if they do exist, you don't actually want one.
To begin with, you need a probability measure on the strings, and any such measure will necessarily look very different from any notion of "uniform." (It's a countable set and all the sigma-algebras over countable sets just clump together sets of elements and assign a probability to each of those sets. You'll want all of the clumps to be singletons.)
Now, if you only give finitely many strings positive probability, you're back in the finite case. So let's ignore that for now and assume that, for any epsilon > 0, you can find a string whose probability is strictly between 0 and epsilon.
Suppose we restrict to the case where the hash functions map strings to {0,1}.
Your family of hash functions will need to be infinite as well and you'll want to talk about it as a probability space of hash functions. If you have a set H of hash functions that has positive probability, then every string is mapped to both 0 and 1 by (different) elements of H. In particular, no single element of H has positive probability. So H has to be uncountable and you've suddenly run into difficult representability issues.
I'd be very happy if someone who hasn't forgotten measure theory would chime in here.
Not with a seed of bounded length and an output of nonzero bounded length.
A fairly crude argument to this effect is, for a finite family of hash functions H, consider a map f from an element x to a tuple giving h(x) for every h in H. Since the codomains of each h and thus f are finite, there exist two strings mapped the same way by all h in H, which, given that there are at least two possible hash values, contradicts pairwise independence.
Is there any good invertible 1-1 function that maps an integer to another integer?
for eg, given the range 0-5, I want to find one that maps:
0->3
1->2
2->4
3->5
4->1
5->0
Also, the mapping should look random.
You can fill an array in ascending order and shuffle it. This will usually perform reasonably well, if not being the most efficient memorywise.
You can also rely on a closed discrete transformation, such as multiplication. If you have 2 numbers, P and K, then (I think) as long as P and K are coprime, P^n mod K will produce a nonrepeating, pseudorandom sequence of values of length (K - 1), ranging from 1 to K. This particular manifestation of discrete math is one of the premises of cryptography. Going backwards from sequence to exponent is known as the discrete logarithm problem and is the reason traditional RSA is secure.
You asked for a reversible algorithm. If you keep track of the exponent, you can go from P^n mod K to P^(n-1) mod K without much difficulty. You can take a few shortcuts to go backwards from power to exponent that don't work in cryptography because certain parameters of the algorithm are intentionally discarded to make it harder.
That said, if you happen to break RSA by solving the discrete log problem while you're working on this, be sure to let me know.
How about permutation polynomials? See section 3 in this article: http://webstaff.itn.liu.se/~stegu/jgt2012/article.pdf It is used for noise there, but it looks exactly like what you want.
It suggest to construct functions of the form (Ax^2 + Bx) mod M. Only a small subset of those functions are invertible/produce permutations, but it shouldn't be hard to find the actual inverse if it exists.
Something similar to this was discussed in Non-repetitive random seek in a range Algorithm. I was intrigued enough to put some ideas down at http://www.mcdowella.demon.co.uk/PermutationFromHash.html
You can generate such a permutation using a block cipher, without having to hold the entire thing in memory (as you would if you were to shuffle the list). I wrote a blog post about it some time ago, which you can find here.
Just as background, I'm aware of the Fisher-Yates perfect shuffle. It is a great shuffle with its O(n) complexity and its guaranteed uniformity and I'd be a fool not to use it ... in an environment that permits in-place updates of arrays (so in most, if not all, imperative programming environments).
Sadly the functional programming world doesn't give you access to mutable state.
Because of Fisher-Yates, however, there's not a lot of literature I can find on how to design a shuffling algorithm. The few places that address it at all do so briefly before saying, in effect, "so here's Fisher-Yates which is all the shuffling you need to know". I had to, in the end, come up with my own solution.
The solution I came up with works like this to shuffle any list of data:
If the list is empty, return the empty set.
If the list has a single item, return that single item.
If the list is non-empty, partition the list with a random number generator and apply the algorithm recursively to each partition, assembling the results.
In Erlang code it looks something like this:
shuffle([]) -> [];
shuffle([L]) -> [L];
shuffle(L) ->
{Left, Right} = lists:partition(fun(_) ->
random:uniform() < 0.5
end, L),
shuffle(Left) ++ shuffle(Right).
(If this looks like a deranged quick sort to you, well, that's what it is, basically.)
So here's my problem: the same situation that makes finding shuffling algorithms that aren't Fisher-Yates difficult makes finding tools to analyse a shuffling algorithm equally difficult. There's lots of literature I can find on analysing PRNGs for uniformity, periodicity, etc. but not a lot of information out there on how to analyse a shuffle. (Indeed some of the information I found on analysing shuffles was just plain wrong -- easily deceived through simple techniques.)
So my question is this: how do I analyse my shuffling algorithm (assuming that the random:uniform() call up there is up to the task of generating apropriate random numbers with good characteristics)? What mathematical tools are there at my disposal to judge whether or not, say, 100,000 runs of the shuffler over a list of integers ranging 1..100 has given me plausibly good shuffling results? I've done a few tests of my own (comparing increments to decrements in the shuffles, for example), but I'd like to know a few more.
And if there's any insight into that shuffle algorithm itself that would be appreciated too.
General remark
My personal approach about correctness of probability-using algorithms: if you know how to prove it's correct, then it's probably correct; if you don't, it's certainly wrong.
Said differently, it's generally hopeless to try to analyse every algorithm you could come up with: you have to keep looking for an algorithm until you find one that you can prove correct.
Analysing a random algorithm by computing the distribution
I know of one way to "automatically" analyse a shuffle (or more generally a random-using algorithm) that is stronger than the simple "throw lots of tests and check for uniformity". You can mechanically compute the distribution associated to each input of your algorithm.
The general idea is that a random-using algorithm explores a part of a world of possibilities. Each time your algorithm asks for a random element in a set ({true, false} when flipping a coin), there are two possible outcomes for your algorithm, and one of them is chosen. You can change your algorithm so that, instead of returning one of the possible outcomes, it explores all solutions in parallel and returns all possible outcomes with the associated distributions.
In general, that would require rewriting your algorithm in depth. If your language supports delimited continuations, you don't have to; you can implement "exploration of all possible outcomes" inside the function asking for a random element (the idea is that the random generator, instead of returning a result, capture the continuation associated to your program and run it with all different results). For an example of this approach, see oleg's HANSEI.
An intermediary, and probably less arcane, solution is to represent this "world of possible outcomes" as a monad, and use a language such as Haskell with facilities for monadic programming. Here is an example implementation of a variant¹ of your algorithm, in Haskell, using the probability monad of the probability package :
import Numeric.Probability.Distribution
shuffleM :: (Num prob, Fractional prob) => [a] -> T prob [a]
shuffleM [] = return []
shuffleM [x] = return [x]
shuffleM (pivot:li) = do
(left, right) <- partition li
sleft <- shuffleM left
sright <- shuffleM right
return (sleft ++ [pivot] ++ sright)
where partition [] = return ([], [])
partition (x:xs) = do
(left, right) <- partition xs
uniform [(x:left, right), (left, x:right)]
You can run it for a given input, and get the output distribution :
*Main> shuffleM [1,2]
fromFreqs [([1,2],0.5),([2,1],0.5)]
*Main> shuffleM [1,2,3]
fromFreqs
[([2,1,3],0.25),([3,1,2],0.25),([1,2,3],0.125),
([1,3,2],0.125),([2,3,1],0.125),([3,2,1],0.125)]
You can see that this algorithm is uniform with inputs of size 2, but non-uniform on inputs of size 3.
The difference with the test-based approach is that we can gain absolute certainty in a finite number of steps : it can be quite big, as it amounts to an exhaustive exploration of the world of possibles (but generally smaller than 2^N, as there are factorisations of similar outcomes), but if it returns a non-uniform distribution we know for sure that the algorithm is wrong. Of course, if it returns an uniform distribution for [1..N] and 1 <= N <= 100, you only know that your algorithm is uniform up to lists of size 100; it may still be wrong.
¹: this algorithm is a variant of your Erlang's implementation, because of the specific pivot handling. If I use no pivot, like in your case, the input size doesn't decrease at each step anymore : the algorithm also considers the case where all inputs are in the left list (or right list), and get lost in an infinite loop. This is a weakness of the probability monad implementation (if an algorithm has a probability 0 of non-termination, the distribution computation may still diverge), that I don't yet know how to fix.
Sort-based shuffles
Here is a simple algorithm that I feel confident I could prove correct:
Pick a random key for each element in your collection.
If the keys are not all distinct, restart from step 1.
Sort the collection by these random keys.
You can omit step 2 if you know the probability of a collision (two random numbers picked are equal) is sufficiently low, but without it the shuffle is not perfectly uniform.
If you pick your keys in [1..N] where N is the length of your collection, you'll have lots of collisions (Birthday problem). If you pick your key as a 32-bit integer, the probability of conflict is low in practice, but still subject to the birthday problem.
If you use infinite (lazily evaluated) bitstrings as keys, rather than finite-length keys, the probability of a collision becomes 0, and checking for distinctness is no longer necessary.
Here is a shuffle implementation in OCaml, using lazy real numbers as infinite bitstrings:
type 'a stream = Cons of 'a * 'a stream lazy_t
let rec real_number () =
Cons (Random.bool (), lazy (real_number ()))
let rec compare_real a b = match a, b with
| Cons (true, _), Cons (false, _) -> 1
| Cons (false, _), Cons (true, _) -> -1
| Cons (_, lazy a'), Cons (_, lazy b') ->
compare_real a' b'
let shuffle list =
List.map snd
(List.sort (fun (ra, _) (rb, _) -> compare_real ra rb)
(List.map (fun x -> real_number (), x) list))
There are other approaches to "pure shuffling". A nice one is apfelmus's mergesort-based solution.
Algorithmic considerations: the complexity of the previous algorithm depends on the probability that all keys are distinct. If you pick them as 32-bit integers, you have a one in ~4 billion probability that a particular key collides with another key. Sorting by these keys is O(n log n), assuming picking a random number is O(1).
If you infinite bitstrings, you never have to restart picking, but the complexity is then related to "how many elements of the streams are evaluated on average". I conjecture it is O(log n) in average (hence still O(n log n) in total), but have no proof.
... and I think your algorithm works
After more reflexion, I think (like douplep), that your implementation is correct. Here is an informal explanation.
Each element in your list is tested by several random:uniform() < 0.5 tests. To an element, you can associate the list of outcomes of those tests, as a list of booleans or {0, 1}. At the beginning of the algorithm, you don't know the list associated to any of those number. After the first partition call, you know the first element of each list, etc. When your algorithm returns, the list of tests are completely known and the elements are sorted according to those lists (sorted in lexicographic order, or considered as binary representations of real numbers).
So, your algorithm is equivalent to sorting by infinite bitstring keys. The action of partitioning the list, reminiscent of quicksort's partition over a pivot element, is actually a way of separating, for a given position in the bitstring, the elements with valuation 0 from the elements with valuation 1.
The sort is uniform because the bitstrings are all different. Indeed, two elements with real numbers equal up to the n-th bit are on the same side of a partition occurring during a recursive shuffle call of depth n. The algorithm only terminates when all the lists resulting from partitions are empty or singletons : all elements have been separated by at least one test, and therefore have one distinct binary decimal.
Probabilistic termination
A subtle point about your algorithm (or my equivalent sort-based method) is that the termination condition is probabilistic. Fisher-Yates always terminates after a known number of steps (the number of elements in the array). With your algorithm, the termination depends on the output of the random number generator.
There are possible outputs that would make your algorithm diverge, not terminate. For example, if the random number generator always output 0, each partition call will return the input list unchanged, on which you recursively call the shuffle : you will loop indefinitely.
However, this is not an issue if you're confident that your random number generator is fair : it does not cheat and always return independent uniformly distributed results. In that case, the probability that the test random:uniform() < 0.5 always returns true (or false) is exactly 0 :
the probability that the first N calls return true is 2^{-N}
the probability that all calls return true is the probability of the infinite intersection, for all N, of the event that the first N calls return 0; it is the infimum limit¹ of the 2^{-N}, which is 0
¹: for the mathematical details, see http://en.wikipedia.org/wiki/Measure_(mathematics)#Measures_of_infinite_intersections_of_measurable_sets
More generally, the algorithm does not terminate if and only if some of the elements get associated to the same boolean stream. This means that at least two elements have the same boolean stream. But the probability that two random boolean streams are equal is again 0 : the probability that the digits at position K are equal is 1/2, so the probability that the N first digits are equal is 2^{-N}, and the same analysis applies.
Therefore, you know that your algorithm terminates with probability 1. This is a slightly weaker guarantee that the Fisher-Yates algorithm, which always terminate. In particular, you're vulnerable to an attack of an evil adversary that would control your random number generator.
With more probability theory, you could also compute the distribution of running times of your algorithm for a given input length. This is beyond my technical abilities, but I assume it's good : I suppose that you only need to look at O(log N) first digits on average to check that all N lazy streams are different, and that the probability of much higher running times decrease exponentially.
Your algorithm is a sort-based shuffle, as discussed in the Wikipedia article.
Generally speaking, the computational complexity of sort-based shuffles is the same as the underlying sort algorithm (e.g. O(n log n) average, O(n²) worst case for a quicksort-based shuffle), and while the distribution is not perfectly uniform, it should approach uniform close enough for most practical purposes.
Oleg Kiselyov provides the following article / discussion:
Provably perfect random shuffling and its pure functional implementations
which covers the limitations of sort-based shuffles in more detail, and also offers two adaptations of the Fischer–Yates strategy: a naive O(n²) one, and a binary-tree-based O(n log n) one.
Sadly the functional programming world doesn't give you access to mutable state.
This is not true: while purely functional programming avoids side effects, it supports access to mutable state with first-class effects, without requiring side effects.
In this case, you can use Haskell's mutable arrays to implement the mutating Fischer–Yates algorithm as described in this tutorial:
Haskell Shuffling (Brett Hall)
Addendum
The specific foundation of your shuffle sort is actually an infinite-key radix sort: as gasche points out, each partition corresponds to a digit grouping.
The main disadvantage of this is the same as any other infinite-key sorting shuffle: there is no termination guarantee. Although the likelihood of termination increases as the comparison proceeds, there is never an upper bound: the worst-case complexity is O(∞).
I was doing some stuff similar to this a while ago, and in particular you might be interested in Clojure's vectors, which are functional and immutable but still with O(1) random access/update characteristics. These two gists have several implementations of a "take N elements at random from this M-sized list"; at least one of them turns into a functional implementation of Fisher-Yates if you let N=M.
https://gist.github.com/805546
https://gist.github.com/805747
Based on How to test randomness (case in point - Shuffling) , I propose:
Shuffle (medium sized) arrays composed of equal numbers of zeroes and ones. Repeat and concatenate until bored. Use these as input to the diehard tests. If you have a good shuffle, then you should be generating random sequences of zeroes and ones (with the caveat that the cumulative excess of zeroes (or ones) is zero at the boundaries of the medium sized arrays, which you would hope the tests detect, but the larger "medium" is the less likely they are to do so).
Note that a test can reject your shuffle for three reasons:
the shuffle algorithm is bad,
the random number generator used by the shuffler or during initialization is bad, or
the test implementation is bad.
You'll have to resolve which is the case if any test rejects.
Various adaptations of the diehard tests (to resolve certain numbers, I used the source from the diehard page). The principle mechanism of adaptation is to make the shuffle algorithm act as a source of uniformly distributed random bits.
Birthday spacings: In an array of n zeroes, insert log n ones. Shuffle. Repeat until bored. Construct the distribution of inter-one distances, compare with the exponential distribution. You should perform this experiment with different initialization strategies -- the ones at the front, the ones at the end, the ones together in the middle, the ones scattered at random. (The latter has the greatest hazard of a bad initialization randomization (with respect to the shuffling randomization) yielding rejection of the shuffling.) This can actually be done with blocks of identical values, but has the problem that it introduces correlation in the distributions (a one and a two can't be at the same location in a single shuffle).
Overlapping permutations: shuffle five values a bunch of times. Verify that the 120 outcomes are about equally likely. (Chi-squared test, 119 degrees of freedom -- the diehard test (cdoperm5.c) uses 99 degrees of freedom, but this is (mostly) an artifact of sequential correlation caused by using overlapping subsequences of the input sequence.)
Ranks of matrices: from 2*(6*8)^2 = 4608 bits from shuffling equal numbers of zeroes and ones, select 6 non-overlapping 8-bit substrings. Treat these as a 6-by-8 binary matrix and compute its rank. Repeat for 100,000 matrices. (Pool together ranks of 0-4. Ranks are then either 6, 5, or 0-4.) The expected fraction of ranks is 0.773118, 0.217439, 0.009443. Chi-squared compare with observed fractions with two degrees of freedom. The 31-by-31 and 32-by-32 tests are similar. Ranks of 0-28 and 0-29 are pooled, respectively. Expected fractions are 0.2887880952, 0.5775761902, 0.1283502644, 0.0052854502. Chi-squared test has three degrees of freedom.
and so on...
You may also wish to leverage dieharder and/or ent to make similar adapted tests.