Calculating unique value from given numbers - algorithm

Let's say I have some 6 random numbers and I want to calculate some unique value from these numbers.
Edit:
Allowed operations are +, -, *, and /. Every number could be used only once. You dont have to use all numbers.
Example:
Given numbers: 3, 6, 100, 50, 25, 75
Requested result: 953
3 + 6 = 9
9 * 100 = 900
900 + 50 = 950
75 / 25 = 3
3 + 950 = 953
What could be easiest algorithmic approach to write a program that solves this problem?

The easiest approach is to try them all: you have six numbers, meaning that there are up to five spots where you can place an operator, and up to 6! permutations. Given that there are only four operators, you need to go through 6!*4^5, or 737280 possibilities. This can be easily done with a recursive function, or even with nested loops. Depending on the language, you could use a library function to deal with permutations.
A language-agnostic recursive approach would have you define three functions:
int calc(int nums[6], int ops[5], int countNums) {
// Calculate the results for a given sequence of numbers
// with the specified operators.
// nums are your numbers; only countNums need to be used
// ops are your operators; only countNums-1 need to be used
// countNums is the number of items to use; it must be from 1 to 6
}
void permutations(int nums[6], int perm[6], int pos) {
// Produces all permutations of the original numbers
// nums are the original numbers
// perm, 0 through pos, is the indexes of nums used in the permutation so far
// pos, is the number of perm items filled so far
}
void solveRecursive(int numPerm[6], int permLen, int ops[5], int pos) {
// Tries all combinations of operations on the given permutation.
// numPermis the permutation of the original numbers
// permLen is the number of items used in the permutation
// ops 0 through pos are operators to be placed between elements
// of the permutation
// pos is the number of operators provided so far.
}

The easiest algorithmic approach would, I think, be backtracking. It's fairly easy to implement and will always find a solution if one exists. The basic idea is recursive: make an arbitrary choice at each step of building a solution and proceed from there. If it doesn't work out, try a different choice. When you run out of choices, report failure to the previous choice point (or report failure to find a solution if there is no previous choice point).
Your choices are: how many numbers will be involved, what each number is (a choice each number position), and how they are connected by operators (a choice for each operator position).

When you mention "unique numbers", assuming that you mean a result in the possible universe of results generated using all the numbers at hand.
If so, why not try a permutation of all operators and available numbers for a start?

If you want to guarantee that you generate a unique number from those numbers, with no chance of getting the same number from a different set of numbers, then you should use radix arithmetic, similar to decimal, hex, etc.
But you need to know the max values of the numbers.
Basically, it would be A + B * MAX_A + C * MAX_A * MAX_B + D * MAX_A * MAX_B * MAX_C + E * MAX_A * MAX_B * MAX_C * MAX_D + F * MAX_A * ... * MAX_E

use recursion to permutate the numbers and operators. it's O(6!*4^5)

Related

Best way to generate U(1,5) from U(1,3)?

I am given a uniform integer random number generator ~ U3(1,3) (inclusive). I would like to generate integers ~ U5(1,5) (inclusive) using U3. What is the best way to do this?
This simplest approach I can think of is to sample twice from U3 and then use rejection sampling. I.e., sampling twice from U3 gives us 9 possible combinations. We can assign the first 5 combinations to 1,2,3,4,5, and reject the last 4 combinations.
This approach expects to sample from U3 9/5 * 2 = 18/5 = 3.6 times.
Another approach could be to sample three times from U3. This gives us a sample space of 27 possible combinations. We can make use of 25 of these combinations and reject the last 2. This approach expects to use U3 27/25 * 3.24 times. But this approach would be a little more tedious to write out since we have a lot more combinations than the first, but the expected number of sampling from U3 is better than the first.
Are there other, perhaps better, approaches to doing this?
I have this marked as language agnostic, but I'm primarily looking into doing this in either Python or C++.
You do not need combinations. A slight tweak using base 3 arithmetic removes the need for a table. Rather than using the 1..3 result directly, subtract 1 to get it into the range 0..2 and treat it as a base 3 digit. For three samples you could do something like:
function sample3()
result <- 0
result <- result + 9 * (randU3() - 1) // High digit: 9
result <- result + 3 * (randU3() - 1) // Middle digit: 3
result <- result + 1 * (randU3() - 1) // Units digit: 1
return result
end function
That will give you a number in the range 0..26, or 1..27 if you add one. You can use that number directly in the rest of your program.
For the range [1, 3] to [1, 5], this is equivalent to rolling a 5-sided die with a 3-sided one.
However, this can't be done without "wasting" randomness (or running forever in the worst case), since all the prime factors of 5 (namely 5) don't divide 3. Thus, the best that can be done is to use rejection sampling to get arbitrarily close to no "waste" of randomness (such as by batching multiple rolls of the 3-sided die until 3^n is "close enough" to a power of 5). In other words, the approaches you give in your question are as good as they can get.
More generally, an algorithm to roll a k-sided die with a p-sided die will inevitably "waste" randomness (and run forever in the worst case) unless "every prime number dividing k also divides p", according to Lemma 3 in "Simulating a dice with a dice" by B. Kloeckner. For example:
Take the much more practical case that p is a power of 2 (and any block of random bits is the same as rolling a die with a power of 2 number of faces) and k is arbitrary. In this case, this "waste" and indefinite running time are inevitable unless k is also a power of 2.
This result applies to any case of rolling a n-sided die with a m-sided die, where n and m are prime numbers. For example, look at the answers to a question for the case n = 7 and m = 5.
See also this question: Frugal conversion of uniformly distributed random numbers from one range to another.
Peter O. is right, you cannot escape to loose some randomness. So the only choice is between how expensive calls to U(1,3) are, code clarity, simplicity etc.
Here is my variant, making bits from U(1,3) and combining them together with rejection
C/C++ (untested!)
int U13(); // your U(1,3)
int getBit() { // single random bit
return (U13()-1)&1;
}
int U15() {
int r;
for(;;) {
int q = getBit() + 2*getBit() + 4*getBit(); // uniform in [0...8)
if (q < 5) { // need range [0...5)
r = q + 1; // q accepted, make it in [1...5]
break;
}
}
return r;
}

Dynamic algorithm to find maximum sum of products of "accessible" numbers in an array

I have been asked to give a dynamic algorithm that would take a sequence of an even amount of numbers (both positive and negative) and do the following:
Each "turn" two numbers are chosen to be multiplied together. The algorithm can only access either end of the sequence. However, if the first number chosen is the leftmost number, the second number can be either the rightmost number, or the new leftmost number (since the old leftmost number has already been "removed/chosen") and vice-versa. The objective of the program is to find the maximum total sum of the products of the two numbers chosen each round.
Example:
Sequence: { 10, 4, 20, -5, 0, 7 }
Optimal result: 7*10 + 0*-5 + 4*20 = 150
My Progress:
I have been trying to find a dynamic approach without much luck. I've been able to deduce that the program is essentially only allowed to multiply the end numbers by the "adjacent" numbers each time, and that the objective is to multiply the smallest possible numbers by the smallest possible numbers (resulting in either a double negative multiplication - a positive number, or the least-smallest number attainable), and continue to apply this rule each time right to the finish. Contrastingly, this rule would also apply in the opposite direction - multiply the largest possible numbers by the largest possible numbers each time. Maybe the best way is to apply both methods at once? I'm not sure, as I mentioned, I haven't had much luck implementing an algorithm for this problem.
Let's look at both a recursive and a bottom-up tabulated approach. First the recursive:
{10, 4,20,-5, 0, 7}
First call:
f(0,5) = max(f(0,3)+0*7, f(2,5)+10*4, f(1,4)+10*7)
Let's follow one thread:
f(1,4) = max(f(1,2)+(-5)*0, f(3,4)+4*20, f(2,3)+4*0)
f(1,2), f(3,4), and f(2,3) are "base cases" and have a direct solution. The function can now save these in a table indexed by i,j, to be accessed later by other threads of the recursion. For example, f(2,5) = max(f(2,3)+0*7... also needs the value for f(2,3) and can avoid creating another function call if the value is already in the table. As the recursive function calls are returned, the function can save the next values in the table for f(1,4), f(2,5), and f(0,3). Since the array in this example is short, the reduction in function calls is not that significant, but for longer arrays, the number of overlapping function calls (to the same i,j) can be much larger, which is why memoization can prove more efficient.
The tabulated approach is what I tried to unfold in my other answer. Here, instead of a recursion, we rely on (in this case) a similar mathematical formulation to calculate the next values in the table, relying on other values in the table that have already been calculated. The stars under the array are meant to illustrate the order by which we calculate the values (using two nested for loops). You can see that the values needed to calculate the formula for each (i,j)-sized subset are either a base case or exist earlier in the loop order; these are: a subset extended two elements to the left, a subset extended two elements to the right, and a subset extended one element to each side.
You're probably looking for a dynamic programming algorithm. Let A be the array of numbers, the recurrence for this problem will be
f(start,stop) = max( // last two numbers multiplied + the rest of sequence,
// first two numbers multiplied + the rest of sequence,
// first number*last number + rest of sequence )
f(start,stop) then is the optimal result for the subsequence of the array beginning at start,stop. You should compute f(start,stop) for all valid values using dynamic programming or memoization.
Hint: The first part // last two numbers multiplied + the rest of sequence looks like:
f(start,stop-2) + A[stop-1]*A[stop-2]
Let i and j represent the first and last indexes of the array, A, after the previous turn. Clearly, they must represent some even-sized contiguous subset of A. Then a general case for dp[i][j] ought to be max(left, right, both) where left = A[i-2]*A[i-1] + dp[i-2][j], right = A[j+1]*A[j+2] + dp[i][j+2], and both = A[i-1]*A[j+1] + dp[i-1][j+1]; and the solution is max(A[i]*A[i+1] + dp[i][i+1]) for all i except the last.
Fortunately, we can compute the dp in a decreasing order, such that the needed values, always representing larger surrounding subsets, are already calculated (stars represent the computed subset):
{10, 4,20,-5, 0, 7}
* * * * * *
* * * *
* *
* * * * (70)
* *
* * * *
* *
* * left = (80 + 70)
* *
Below is the code snippet of recursive approach.
public class TestClass {
public static void main(String[] args) {
int[] arr = {10, 4, 20, -5, 0, 7};
System.out.println(findMaximumSum(arr, 0, arr.length - 1));
}
private static int findMaximumSum(int[] arr, int start, int end) {
if (end - start == 1)
return arr[start] * arr[end];
return findMaximum(
findMaximumSum(arr, start + 2, end) + (arr[start] * arr[start + 1]),
findMaximumSum(arr, start + 1, end - 1) + (arr[start] * arr[end]),
findMaximumSum(arr, start, end - 2)+ (arr[end] * arr[end - 1])
);
}
private static int findMaximum(int x, int y, int z) {
return Math.max(Math.max(x, y), z);
}
}
The result is 10*4 + 20*7 + -5*0 = 180
and similarly for the input {3,9,7,1,8,2} the answer is 3*2 + 9*8 + 7*1 = 85
Let's turn this into a sweet dynamic programming formula.
We define the subproblem as follows:
We would like to maximize the total sum, while picking either the first two, the first and last, or the last two values of the subarray i, j.
Then the recurrence equation looks like this:
set OPT(i,j) =
if i == j
v[i]
else if i < j:
max (
v[i] + v[i+1] + OPT(i + 2,j),
v[i] + v[j] + OPT(i + 1,j + 1),
v[j] + v[j-1] + OPT(i, j - 2)
)
else:
0
The topological order: either or both i and j are getting smaller.
Now the base case comes when i is equal to j, the value is returned.
And to come to the original problem, calling OPT(0,n-1) returns the maximum sum.
The time complexity is O(n^2). Since we use dynamic programming, it enables us to cache all values. Per subproblem call, we use at most O(n) time, and we do this O(n) times.

Reason for the number 5381 in the DJB hash function?

Can anyone tell me why the number 5381 is used in the DJB hash function?
The DJB hash function is defined as:
h 0 = 5381
h i = 33h i - 1 + s i
Here's a C implementation:
unsigned int DJBHash(char* str, unsigned int len)
{
unsigned int hash = 5381;
unsigned int i = 0;
for(i = 0; i < len; str++, i++)
{
hash = ((hash << 5) + hash) + (*str);
}
return hash;
}
I stumbled across a comment that sheds some light on what DJB is up to:
/*
* DJBX33A (Daniel J. Bernstein, Times 33 with Addition)
*
* This is Daniel J. Bernstein's popular `times 33' hash function as
* posted by him years ago on comp.lang.c. It basically uses a function
* like ``hash(i) = hash(i-1) * 33 + str[i]''. This is one of the best
* known hash functions for strings. Because it is both computed very
* fast and distributes very well.
*
* The magic of number 33, i.e. why it works better than many other
* constants, prime or not, has never been adequately explained by
* anyone. So I try an explanation: if one experimentally tests all
* multipliers between 1 and 256 (as RSE did now) one detects that even
* numbers are not useable at all. The remaining 128 odd numbers
* (except for the number 1) work more or less all equally well. They
* all distribute in an acceptable way and this way fill a hash table
* with an average percent of approx. 86%.
*
* If one compares the Chi^2 values of the variants, the number 33 not
* even has the best value. But the number 33 and a few other equally
* good numbers like 17, 31, 63, 127 and 129 have nevertheless a great
* advantage to the remaining numbers in the large set of possible
* multipliers: their multiply operation can be replaced by a faster
* operation based on just one shift plus either a single addition
* or subtraction operation. And because a hash function has to both
* distribute good _and_ has to be very fast to compute, those few
* numbers should be preferred and seems to be the reason why Daniel J.
* Bernstein also preferred it.
*
*
* -- Ralf S. Engelschall <rse#engelschall.com>
*/
That's a slightly different hash function than the one you're looking at, though it does use the 5381 magic number. The code below that comment at the link target has been unrolled.
Then I found this:
Magic Constant 5381:
1. odd number
2. prime number
3. deficient number
4. 001/010/100/000/101 b
There is also this answer to Can anybody explain the logic behind djb2 hash function? It references a post by DJB himself to a mailing list that mentions 5381 (excerpt from that answer excerpted here):
[...] practically any good multiplier works. I think you're worrying
about the fact that 31c + d doesn't cover any reasonable range of hash
values if c and d are between 0 and 255. That's why, when I discovered
the 33 hash function and started using it in my compressors, I started
with a hash value of 5381. I think you'll find that this does just as
well as a 261 multiplier.
5381 is just a number that, in testing, resulted in fewer collisions and better avalanching. You'll find "magic constants" in just about every hash algo.
I found a very interesting property of this number may be that can be a reason for that.
5381 is 709th prime.
709 is 127th prime.
127 is 31st prime.
31 is 11th prime.
11 is 5th prime.
5 is 3rd prime.
3 is 2nd prime.
2 is 1st prime.
5381 is the first number for which this happens for 8 times. 5381st prime may exceed the limit of signed int so it is a good point to stop the chain.

How to generate a number in arbitrary range using random()={0..1} preserving uniformness and density?

Generate a random number in range [x..y] where x and y are any arbitrary floating point numbers. Use function random(), which returns a random floating point number in range [0..1] from P uniformly distributed numbers (call it "density"). Uniform distribution must be preserved and P must be scaled as well.
I think, there is no easy solution for such problem. To simplify it a bit, I ask you how to generate a number in interval [-0.5 .. 0.5], then in [0 .. 2], then in [-2 .. 0], preserving uniformness and density? Thus, for [0 .. 2] it must generate a random number from P*2 uniformly distributed numbers.
The obvious simple solution random() * (x - y) + y will generate not all possible numbers because of the lower density for all abs(x-y)>1.0 cases. Many possible values will be missed. Remember, that random() returns only a number from P possible numbers. Then, if you multiply such number by Q, it will give you only one of P possible values, scaled by Q, but you have to scale density P by Q as well.
If I understand you problem well, I will provide you a solution: but I would exclude 1, from the range.
N = numbers_in_your_random // [0, 0.2, 0.4, 0.6, 0.8] will be 5
// This turns your random number generator to return integer values between [0..N[;
function randomInt()
{
return random()*N;
}
// This turns the integer random number generator to return arbitrary
// integer
function getRandomInt(maxValue)
{
if (maxValue < N)
{
return randomInt() % maxValue;
}
else
{
baseValue = randomInt();
bRate = maxValue DIV N;
bMod = maxValue % N;
if (baseValue < bMod)
{
bRate++;
}
return N*getRandomInt(bRate) + baseValue;
}
}
// This will return random number in range [lower, upper[ with the same density as random()
function extendedRandom(lower, upper)
{
diff = upper - lower;
ndiff = diff * N;
baseValue = getRandomInt(ndiff);
baseValue/=N;
return lower + baseValue;
}
If you really want to generate all possible floating point numbers in a given range with uniform numeric density, you need to take into account the floating point format. For each possible value of your binary exponent, you have a different numeric density of codes. A direct generation method will need to deal with this explicitly, and an indirect generation method will still need to take it into account. I will develop a direct method; for the sake of simplicity, the following refers exclusively to IEEE 754 single-precision (32-bit) floating point numbers.
The most difficult case is any interval that includes zero. In that case, to produce an exactly even distribution, you will need to handle every exponent down to the lowest, plus denormalized numbers. As a special case, you will need to split zero into two cases, +0 and -0.
In addition, if you are paying such close attention to the result, you will need to make sure that you are using a good pseudorandom number generator with a large enough state space that you can expect it to hit every value with near-uniform probability. This disqualifies the C/Unix rand() and possibly the*rand48() library functions; you should use something like the Mersenne Twister instead.
The key is to dissect the target interval into subintervals, each of which is covered by different combination of binary exponent and sign: within each subinterval, floating point codes are uniformly distributed.
The first step is to select the appropriate subinterval, with probability proportional to its size. If the interval contains 0, or otherwise covers a large dynamic range, this may potentially require a number of random bits up to the full range of the available exponent.
In particular, for a 32-bit IEEE-754 number, there are 256 possible exponent values. Each exponent governs a range which is half the size of the next greater exponent, except for the denormalized case, which is the same size as the smallest normal exponent region. Zero can be considered the smallest denormalized number; as mentioned above, if the target interval straddles zero, the probability of each of +0 and -0 should perhaps be cut in half, to avoid doubling its weight.
If the subinterval chosen covers the entire region governed by a particular exponent, all that is necessary is to fill the mantissa with random bits (23 bits, for 32-bit IEEE-754 floats). However, if the subinterval does not cover the entire region, you will need to generate a random mantissa that covers only that subinterval.
The simplest way to handle both the initial and secondary random steps may be to round the target interval out to include the entirety of all exponent regions partially covered, then reject and retry numbers that fall outside it. This allows the exponent to be generated with simple power-of-2 probabilities (e.g., by counting the number of leading zeroes in your random bitstream), as well as providing a simple and accurate way of generating a mantissa that covers only part of an exponent interval. (This is also a good way of handling the +/-0 special case.)
As another special case: to avoid inefficient generation for target intervals which are much smaller than the exponent regions they reside in, the "obvious simple" solution will in fact generate fairly uniform numbers for such intervals. If you want exactly uniform distributions, you can generate the sub-interval mantissa by using only enough random bits to cover that sub-interval, while still using the aforementioned rejection method to eliminate values outside the target interval.
well, [0..1] * 2 == [0..2] (still uniform)
[0..1] - 0.5 == [-0.5..0.5] etc.
I wonder where have you experienced such an interview?
Update: well, if we want to start caring about losing precision on multiplication (which is weird, because somehow you did not care about that in the original task, and pretend we care about "number of values", we can start iterating. In order to do that, we need one more function, which would return uniformly distributed random values in [0..1) — which can be done by dropping the 1.0 value would it ever appear. After that, we can slice the whole range in equal parts small enough to not care about losing precision, choose one randomly (we have enough randomness to do that), and choose a number in this bucket using [0..1) function for all parts but the last one.
Or, you can come up with a way to code enough values to care about—and just generate random bits for this code, in which case you don't really care whether it's [0..1] or just {0, 1}.
Let me rephrase your question:
Let random() be a random number generator with a discrete uniform distribution over [0,1). Let D be the number of possible values returned by random(), each of which is precisely 1/D greater than the previous. Create a random number generator rand(L, U) with a discrete uniform distribution over [L, U) such that each possible value is precisely 1/D greater than the previous.
--
A couple quick notes.
The problem in this form, and as you phrased it is unsolvable. That
is, if N = 1 there is nothing we can do.
I don't require that 0.0 be one of the possible values for random(). If it is not, then it is possible that the solution below will fail when U - L < 1 / D. I'm not particularly worried about that case.
I use all half-open ranges because it makes the analysis simpler. Using your closed ranges would be simple, but tedious.
Finally, the good stuff. The key insight here is that the density can be maintained by independently selecting the whole and fractional parts of the result.
First, note that given random() it is trivial to create randomBit(). That is,
randomBit() { return random() >= 0.5; }
Then, if we want to select one of {0, 1, 2, ..., 2^N - 1} uniformly at random, that is simple using randomBit(), just generate each of the bits. Call this random2(N).
Using random2() we can select one of {0, 1, 2, ..., N - 1}:
randomInt(N) { while ((val = random2(ceil(log2(N)))) >= N); return val; }
Now, if D is known, then the problem is trivial as we can reduce it to simply choosing one of floor((U - L) * D) values uniformly at random and we can do that with randomInt().
So, let's assume that D is not known. Now, let's first make a function to generate random values in the range [0, 2^N) with the proper density. This is simple.
rand2D(N) { return random2(N) + random(); }
rand2D() is where we require that the difference between consecutive possible values for random() be precisely 1/D. If not, the possible values here would not have uniform density.
Next, we need a function that selects a value in the range [0, V) with the proper density. This is similar to randomInt() above.
randD(V) { while ((val = rand2D(ceil(log2(V)))) >= V); return val; }
And finally...
rand(L, U) { return L + randD(U - L); }
We now may have offset the discrete positions if L / D is not an integer, but that is unimportant.
--
A last note, you may have noticed that several of these functions may never terminate. That is essentially a requirement. For example, random() may have only a single bit of randomness. If I then ask you to select from one of three values, you cannot do so uniformly at random with a function that is guaranteed to terminate.
Consider this approach:
I'm assuming the base random number generator in the range [0..1]
generates among the numbers
0, 1/(p-1), 2/(p-1), ..., (p-2)/(p-1), (p-1)/(p-1)
If the target interval length is less than or equal to 1,
return random()*(y-x) + x.
Else, map each number r from the base RNG to an interval in the
target range:
[r*(p-1)*(y-x)/p, (r+1/(p-1))*(p-1)*(y-x)/p]
(i.e. for each of the P numbers assign one of P intervals with length (y-x)/p)
Then recursively generate another random number in that interval and
add it to the interval begin.
Pseudocode:
const p;
function rand(x, y)
r = random()
if y-x <= 1
return x + r*(y-x)
else
low = r*(p-1)*(y-x)/p
high = low + (y-x)/p
return x + low + rand(low, high)
In real math: the solution is just the provided:
return random() * (upper - lower) + lower
The problem is that, even when you have floating point numbers, only have a certain resolution. So what you can do is apply above function and add another random() value scaled to the missing part.
If I make a practical example it becomes clear what I mean:
E.g. take random() return value from 0..1 with 2 digits accuracy, ie 0.XY, and lower with 100 and upper with 1100.
So with above algorithm you get as result 0.XY * (1100-100) + 100 = XY0.0 + 100.
You will never see 201 as result, as the final digit has to be 0.
Solution here would be to generate again a random value and add it *10, so you have accuracy of one digit (here you have to take care that you dont exceed your given range, which can happen, in this case you have to discard the result and generate a new number).
Maybe you have to repeat it, how often depends on how many places the random() function delivers and how much you expect in your final result.
In a standard IEEE format has a limited precision (i.e. double 53 bits). So when you generate a number this way, you never need to generate more than one additional number.
But you have to be careful that when you add the new number, you dont exceed your given upper limit. There are multiple solutions to it: First if you exceed your limit, you start from new, generating a new number (dont cut off or similar, as this changes the distribution).
Second possibility is to check the the intervall size of the missing lower bit range, and
find the middle value, and generate an appropiate value, that guarantees that the result will fit.
You have to consider the amount of entropy that comes from each call to your RNG. Here is some C# code I just wrote that demonstrates how you can accumulate entropy from low-entropy source(s) and end up with a high-entropy random value.
using System;
using System.Collections.Generic;
using System.Security.Cryptography;
namespace SO_8019589
{
class LowEntropyRandom
{
public readonly double EffectiveEntropyBits;
public readonly int PossibleOutcomeCount;
private readonly double interval;
private readonly Random random = new Random();
public LowEntropyRandom(int possibleOutcomeCount)
{
PossibleOutcomeCount = possibleOutcomeCount;
EffectiveEntropyBits = Math.Log(PossibleOutcomeCount, 2);
interval = 1.0 / PossibleOutcomeCount;
}
public LowEntropyRandom(int possibleOutcomeCount, int seed)
: this(possibleOutcomeCount)
{
random = new Random(seed);
}
public int Next()
{
return random.Next(PossibleOutcomeCount);
}
public double NextDouble()
{
return interval * Next();
}
}
class EntropyAccumulator
{
private List<byte> currentEntropy = new List<byte>();
public double CurrentEntropyBits { get; private set; }
public void Clear()
{
currentEntropy.Clear();
CurrentEntropyBits = 0;
}
public void Add(byte[] entropy, double effectiveBits)
{
currentEntropy.AddRange(entropy);
CurrentEntropyBits += effectiveBits;
}
public byte[] GetBytes(int count)
{
using (var hasher = new SHA512Managed())
{
count = Math.Min(count, hasher.HashSize / 8);
var bytes = new byte[count];
var hash = hasher.ComputeHash(currentEntropy.ToArray());
Array.Copy(hash, bytes, count);
return bytes;
}
}
public byte[] GetPackagedEntropy()
{
// Returns a compact byte array that represents almost all of the entropy.
return GetBytes((int)(CurrentEntropyBits / 8));
}
public double GetDouble()
{
// returns a uniformly distributed number on [0-1)
return (double)BitConverter.ToUInt64(GetBytes(8), 0) / ((double)UInt64.MaxValue + 1);
}
public double GetInt(int maxValue)
{
// returns a uniformly distributed integer on [0-maxValue)
return (int)(maxValue * GetDouble());
}
}
class Program
{
static void Main(string[] args)
{
var random = new LowEntropyRandom(2); // this only provides 1 bit of entropy per call
var desiredEntropyBits = 64; // enough for a double
while (true)
{
var adder = new EntropyAccumulator();
while (adder.CurrentEntropyBits < desiredEntropyBits)
{
adder.Add(BitConverter.GetBytes(random.Next()), random.EffectiveEntropyBits);
}
Console.WriteLine(adder.GetDouble());
Console.ReadLine();
}
}
}
}
Since I'm using a 512-bit hash function, that is the max amount of entropy that you can get out of the EntropyAccumulator. This could be fixed, if necessarily.
If I understand your problem correctly, it's that rand() generates finely spaced but ultimately discrete random numbers. And if we multiply it by (y-x) which is large, this spreads these finely spaced floating point values out in a way that is missing many of the floating point values in the range [x,y]. Is that all right?
If so, I think we have a solution already given by Dialecticus. Let me explain why he is right.
First, we know how to generate a random float and then add another floating point value to it. This may produce a round off error due to addition, but it will be in the last decimal place only. Use doubles or something with finer numerical resolution if you want better precision. So, with that caveat, the problem is no harder than finding a random float in the range [0,y-x] with uniform density. Let's say y-x = z. Obviously, since z is a floating point it may not be an integer. We handle the problem in two steps: first we generate the random digits to the left of the decimal point and then generate the random digits to the right of it. Doing both uniformly means their sum is uniformly distributed across the range [0,z] too. Let w be the largest integer <= z. To answer our simplified problem, we can first pick a random integer from the range {0,1,...,w}. Then, step #2 is to add a random float from the unit interval to this random number. This isn't multiplied by any possibly large values, so it has as fine a resolution as the numerical type can have. (Assuming you're using an ideal random floating point number generator.)
So what about the corner case where the random integer was the largest one (i.e. w) and the random float we added to it was larger than z - w so that the random number exceeds the allowed maximum? The answer is simple: do all of it again and check the new result. Repeat until you get a digit in the allowed range. It's an easy proof that a uniformly generated random number which is tossed out and generated again if it's outside an allowed range results in a uniformly generated random in the allowed range. Once you make this key observation, you see that Dialecticus met all your criteria.
When you generate a random number with random(), you get a floating point number between 0 and 1 having an unknown precision (or density, you name it).
And when you multiply it with a number (NUM), you lose this precision, by lg(NUM) (10-based logarithm). So if you multiply by 1000 (NUM=1000), you lose the last 3 digits (lg(1000) = 3).
You may correct this by adding a smaller random number to the original, which has this missing 3 digits. But you don't know the precision, so you can't determine where are they exactly.
I can imagine two scenarios:
(X = range start, Y = range end)
1: you define the precision (PREC, eg. 20 digits, so PREC=20), and consider it enough to generate a random number, so the expression will be:
( random() * (Y-X) + X ) + ( random() / 10 ^ (PREC-trunc(lg(Y-X))) )
with numbers: (X = 500, Y = 1500, PREC = 20)
( random() * (1500-500) + 500 ) + ( random() / 10 ^ (20-trunc(lg(1000))) )
( random() * 1000 + 500 ) + ( random() / 10 ^ (17) )
There are some problems with this:
2 phase random generation (how much will it be random?)
the first random returns 1 -> result can be out of range
2: guess the precision by random numbers
you define some tries (eg. 4) to calculate the precision by generating random numbers and count the precision every time:
- 0.4663164 -> PREC=7
- 0.2581916 -> PREC=7
- 0.9147385 -> PREC=7
- 0.129141 -> PREC=6 -> 7, correcting by the average of the other tries
That's my idea.

Counting combinations of pairs of items from multiple lists without repetition

Given a scenario where we have multiple lists of pairs of items, for example:
{12,13,14,23,24}
{14,15,25}
{16,17,25,26,36}
where 12 is a pair of items '1' and '2' (and thus 21 is equivalent to 12), we want to count the number of ways that we can choose pairs of items from each of the lists such that no single item is repeated. You must select one, and only one pair, from each list. The number of items in each list and the number of lists is arbitrary, but you can assume there are at least two lists with at least one pair of items per list. And the pairs are made from symbols from a finite alphabet, assume digits [1-9]. Also, a list can neither contain duplicate pairs {12,12} or {12,21} nor can it contain symmetric pairs {11}.
More specifically, in the example above, if we choose the pair of items 14 from the first list, then the only choice we have for the second list is 25 because 14 and 15 contain a '1'. And consequently, the only choice from the third list is 36 because 16 and 17 contain a '1', and 25 and 26 contain a '2'.
Does anyone know of an efficient way to count the total combinations of pairs of items without going through every permutation of choices and asking "is this a valid selection?", as the lists can each contain hundreds of pairs of items?
UPDATE
After spending some time with this, I realized that it is trivial to count the number of combinations when none of the lists share a distinct pair. However, as soon as a distinct pair is shared between two or more lists, the combinatorial formula does not apply.
As of now, I've been trying to figure out if there is a way (using combinatorial math and not brute force) to count the number of combinations in which every list has the same pairs of items. For example:
{12,23,34,45,67}
{12,23,34,45,67}
{12,23,34,45,67}
The problem is #P-complete. This is even HARDER than NP-complete. It is as hard as finding the number of satisfying assignments to an instance of SAT.
The reduction is from Perfect matching. Suppose you have the graph G = {V, E} where E, the set of edges, is a list of pairs of vertices (those pairs that are connected by an edge). Then encode an instance of "pairs of items" by having |V|/2 copies of E. In other words, have a number of copies of E equal to half of the number of vertices. Now, a "hit" in your case would correspond to |V|/2 edges with no repeated vertices, implying that all |V| vertices were covered. This is the definition of a perfect matching. And every perfect matching would be a hit -- it's a 1-1 correspondence.
Lets says that every element in the lists is a node in a graph. There is an edge between two nodes if they can be selected together (they have no common symbol). There is no edge between two nodes of the same list. If we have n lists the problem is to find the number of cliques of size n in this graph. There is no clique which is bigger than n elements. Given that finding out whether at least one such clique exists is np-complete I think this problem is np-complete. See: http://en.wikipedia.org/wiki/Clique_problem
As pointed out we have to prove that solving this problem can solve the Clique problem to show that this is NP-complete. If we can count the number of required sets ie the number of n size cliques then we know whether there is at least one clique with size n. Unfortunatelly if there is no clique of size n then we don't know whether there are cliques with size k < n.
Another question is whether we can represent any graph in this problem. I guess yes but I am not sure about it.
I still feel this is NP-Complete
While the problem looks quite simple it could be related to the NP-complete Set Cover Problem. So it could be possible that there is no efficent way to detect valid combinations, hence no efficent way to count them.
UPDATE
I thought about the list items beeing pairs because it seems to make the problem harder to attack - you have to check two properties for one item. So I looked for a way to reduce the pair to a scalar item and found a way.
Map the set of the n symbols to the set of the first n primes - I will call this function M. In the case of the symbols 0 to 9 we obtain the following mapping and M(4) = 11 for example.
{0, 1, 2, 3, 4, 5, 6, 7, 8, 9} => {2, 3, 5, 7, 11, 13, 17, 19, 23, 29}
Now we can map a pair (n, m) using the mapping X to the product of the mappings of n and m. This will turn the pair (2, 5) into X((2, 5)) = M(2) * M(5) = 5 * 13 = 65.
X((n, m)) = M(n) * M(m)
Why all this? If we have two pairs (a, b) and (c, d) from two lists, map them using the mapping X to x and y and multiply them, we obtain the product x * y = M(a) * M(b) * M(c) * M(d) - a product of four primes. We can extend the product by more factors by selecting a pair from each list and obtain a product of 2w primes if we have w lists. The final question is what does this product tell us about the pairs we selected and multiplied? If the selected pairs form a valid selection, we never choose one symbol twice, hence the product contains no prime twice and is square free. If the selection is invalid the product contains at least one prime twice and is not square free. And here a final example.
X((2, 5)) = 5 * 13 = 65
X((3, 6)) = 7 * 17 = 119
X((3, 4)) = 7 * 11 = 77
Selecting 25 and 36 yields 65 * 119 = 7735 = 5 * 7 * 13 * 17 and is square free, hence valid. Selecting 36 and 34 yields 119 * 77 = 9163 = 7² * 11 * 17 and is not square free, hence not valid.
Also note how nicely this preserves the symmetrie - X((m, n)) = X((n, m)) - and prohibites symmetric pairs because X((m, m)) = M(m) * M(m) is not square free.
I don't know if this will be any help, but now you know it and can think about it...^^
This is the first part of an reduction of a 3-SAT problem to this problem. The 3-SET problem is the following.
(!A | B | C) & (B | !C | !D) & (A | !B)
And here is the reduction as far as I got.
m-n represents a pair
a line reprresents a list
an asterisk represents an abitrary unique symbol
A1-A1' !A1-!A1' => Select A true or false
B1-B1' !B1-!B1' => Select B true or false
C1-C1' !C1-!C1' => Select C true or false
D1-D1' !D1-!D1' => Select D true or false
A1-* !B1-* !C1-* => !A | B | C
A2-!A1' !A2-A1' => Create a copy of A
B2-!B1' !B2-B1' => Create a copy of B
C2-!C1' !C2-C1' => Create a copy of C
D2-!D1' !D2-D1' => Create a copy of D
!B2-* C2-* D2-* => B | !C | !D
(How to perform a second copy of the four variables???)
!A3-* B3-*
If I (or somebody else) can complete this reduction and show how to do it in the general case, this will proof the problem NP-complete. I am just stuck with copying the variables a second time.
I am going to say there is no calculation that you can do other than brute force becuse there is a function that has to be evaluated to decide whether an item from set B can be used given the item chosen in set A. Simple combinatorial math wont work.
You can speed up the calculation by 1 to 2 magnitudes using memoization and hashing.
Memoization is remembering previous results of similar brute force paths. If you are at list n and you have already consumed symbols x,y,z and previously you have encountered this situation, then you will be adding in the same number of possible combinations from the remaining lists. It does not matter how you got to list n using x,y,z. So, use a cached result if there is one, or continue the calc to the next list and check there. If you make a brute force recursive algorithm to calculate the result, but cache results, this works great.
The key to the saved result is: the current list, and the symbols that have been used. Sort the symbols to make your key. I think a dictionary or an array of dictionaries makes sense here.
Use hashing to reduce the number of pairs that need to be searched in each list. For each list, make a hash of the pairs that would be available given that a certain number of symbols are already consumed. Choose the number of consumed symbols you want to use in your hash based on how much memory you want to use and the time you want to spend pre-calculating. I think using 1-2 symbols would be good. Sort these hashes by the number of items in them...ascending, and then keep the top n. I say throw out the rest, becasue if the hash only reduces your work a small amount, its probably not worth keeping (it will take longer to find the hash if there are more of them). So as you are going through the lists, you can do a quick scan the list's hash to see if you have used a symbol in the hash. If you have, then use the first hash that comes up to scan the list. The first hash would contain the fewest pairs to scan. If you are really handy, you might be able to build these hashes as you go and not waste time up front to do it.
You might be able to toss the hash and use a tree, but my guess is that filling the tree will take a long time.
Constraint programming is a nice approach if you want to generate all the combinations. Just to try it out, I wrote a model using Gecode (version 3.2.2) to solve your problem. The two examples given are very easy to solve, but other instances might be harder. It should be better than generate and test in any case.
/*
* Main authors:
* Mikael Zayenz Lagerkvist <lagerkvist#gecode.org>
*
* Copyright:
* Mikael Zayenz Lagerkvist, 2009
*
* Permission is hereby granted, free of charge, to any person obtaining
* a copy of this software and associated documentation files (the
* "Software"), to deal in the Software without restriction, including
* without limitation the rights to use, copy, modify, merge, publish,
* distribute, sublicense, and/or sell copies of the Software, and to
* permit persons to whom the Software is furnished to do so, subject to
* the following conditions:
*
* The above copyright notice and this permission notice shall be
* included in all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
* MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
* LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
* OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
* WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
*
*/
#include <gecode/driver.hh>
#include <gecode/int.hh>
#include <gecode/minimodel.hh>
using namespace Gecode;
namespace {
/// List of specifications
extern const int* specs[];
/// Number of specifications
extern const unsigned int n_specs;
}
/**
* \brief Selecting pairs
*
* Given a set of lists of pairs of values, select a pair from each
* list so that no value is selected more than once.
*
*/
class SelectPairs : public Script {
protected:
/// Specification
const int* spec;
/// The values from all selected pairs
IntVarArray s;
public:
/// The actual problem
SelectPairs(const SizeOptions& opt)
: spec(specs[opt.size()]),
s(*this,spec[0] * 2,Int::Limits::min, Int::Limits::max) {
int pos = 1; // Position read from spec
// For all lists
for (int i = 0; i < spec[0]; ++i) {
int npairs = spec[pos++];
// Construct TupleSet for pairs from list i
TupleSet ts;
for (int p = 0; p < npairs; ++p) {
IntArgs tuple(2);
tuple[0] = spec[pos++];
tuple[1] = spec[pos++];
ts.add(tuple);
}
ts.finalize();
// <s[2i],s[2i+1]> must be from list i
IntVarArgs pair(2);
pair[0] = s[2*i]; pair[1] = s[2*i + 1];
extensional(*this, pair, ts);
}
// All values must be pairwise distinct
distinct(*this, s, opt.icl());
// Select values for the variables
branch(*this, s, INT_VAR_SIZE_MIN, INT_VAL_MIN);
}
/// Constructor for cloning \a s
SelectPairs(bool share, SelectPairs& sp)
: Script(share,sp), spec(sp.spec) {
s.update(*this, share, sp.s);
}
/// Perform copying during cloning
virtual Space*
copy(bool share) {
return new SelectPairs(share,*this);
}
/// Print solution
virtual void
print(std::ostream& os) const {
os << "\t";
for (int i = 0; i < spec[0]; ++i) {
os << "(" << s[2*i] << "," << s[2*i+1] << ") ";
if ((i+1) % 10 == 0)
os << std::endl << "\t";
}
if (spec[0] % 10)
os << std::endl;
}
};
/** \brief Main-function
* \relates SelectPairs
*/
int
main(int argc, char* argv[]) {
SizeOptions opt("SelectPairs");
opt.iterations(500);
opt.size(0);
opt.parse(argc,argv);
if (opt.size() >= n_specs) {
std::cerr << "Error: size must be between 0 and "
<< n_specs-1 << std::endl;
return 1;
}
Script::run<SelectPairs,DFS,SizeOptions>(opt);
return 0;
}
namespace {
const int s0[] = {
// Number of lists
3,
// Lists (number of pairs, pair0, pair1, ...)
5, 1,2, 1,3, 1,4, 2,3, 2,4,
3, 1,4, 1,5, 2,5,
5, 1,6, 1,7, 2,5, 2,6, 3,6
};
const int s1[] = {
// Number of lists
3,
// Lists (number of pairs, pair0, pair1, ...)
5, 1,2, 2,3, 3,4, 4,5, 6,7,
5, 1,2, 2,3, 3,4, 4,5, 6,7,
5, 1,2, 2,3, 3,4, 4,5, 6,7
};
const int *specs[] = {s0, s1};
const unsigned n_specs = sizeof(specs)/sizeof(int*);
}
First try.. Here is an algorithm with an improved reduced average complexity than brute force. Essentially you create strings with increasing lengths in each iteration. This may not be the best solution but we will wait for the best one to come by... :)
Start with list 1. All entries in that list are valid solutions of length 2 (#=5)
Next, when you introduce list 2. keep a record of all valid solutions of length 4, which end up being {1425, 2314, 2315, 2415} (#=4).
When you add the third list to the mix, repeat the process. You will end up with {142536, 241536} (#=2).
The complexity reduction comes in place because you are throwing away bad strings in each iteration. The worst case scenario happens to be still the same -- in the case that all pairs are distinct.
This feels like a good problem to which to apply a constraint programming approach. To the list of packages provided by Wikipedia I'll add that I've had good experience using Gecode in the past; the Gecode examples also provide a basic tutorial to constraint programming. Constraint Processing is a good book on the subject if you want to dig deeper into how the algorithms work.

Resources