Find a math algorithm for an equation - algorithm

I just post a mathematic question at math.stackexchange, but I'll ask people here for a programmatically recursive algorithm.
The problem: fill in the blank number from 1 to 9 (once and only once each blank) to finish the equation.
Additional conditions:
1. Mathematic priority DOES matter.
2. All numbers (include evaluation result) should be integers.
Which mean the divide should be divisible (E.g. 9 mod 3 = 0 is OK, 8 mod 3 != 0 is not OK).
3. For those who don't know (as one in the original question), the operations in the diagram are:
+ = plus; : = divide; X = multiple; - = minus.
There should be more than 1 answer. I'd like to have a recursive algorithm to find out all the solutions.
Original question
PS: I'd like to learn about the recursive algorithm, performance improval. I was trying to solve the problem using brute force. My PC freeze for quite a while.

You have to find the right permuations
9! = 362880
This is not a big number and you can do your calculations the following way:
isValid(elements)
//return true if and only if the permutation of elements yields the expected result
end isValid
isValid is the validator, which checks whether a given permutation is correct.
calculate(elements, depth)
//End sign
if (depth >= 9) then
//if valid, then store
if (isValid(elements)) then
store(elements)
end if
return
end if
//iterate elements
for element = 1 to 9
//exclude elements already in the set
if (not contains(elements, element)) then
calculate(union(elements, element), depth + 1)
end if
end for
end calculate
Call calculate as follows:
calculate(emptySet, 1)

Here's a solution using PARI/GP:
div(a,b)=if(b&&a%b==0,a/b,error())
f(v)=
{
iferr(
v[1]+div(13*v[2],v[3])+v[4]+12*v[5]-v[6]-11+div(v[7]*v[8],v[9])-10==66
, E, 0)
}
for(i=0,9!-1,if(f(t=numtoperm(9,i)),print(t)))
The function f defines the particular function here. I used a helper function div which throws an error if the division fails (producing a non-integer or dividing by 0).
The program could be made more efficient by splitting out the blocks which involve division and aborting early if they fail. But since this takes only milliseconds to run through all 9! permutations I didn't think it was worth it.

Related

Algorithm for a stair climbing permutation

I've been asked to build an algorithm that involves a permutation and I'm a little stumped and looking for a starting place. The details are this...
You are climbing a staircase of n stairs. Each time you can either climb 1 or 2 steps at a time. How many distinct ways can you climb to the top?
Any suggestions how I can tackle this challenge?
You're wrong about permutations. Permutations involve orderings of a set. This problem is something else.
Problems involving simple decisions that produce another instance of the same problem are often solvable by dynamic programming. This is such a problem.
When you have n steps to climb, you can choose a hop of either 1 or 2 steps, then solve the smaller problems for n-1 and n-2 steps respectively. In this case you want to add the numbers of possibilities. (Many DPs are to find minimums or maximums instead, so this is a bit unusual.)
The "base cases" are when you have either 0 or 1 step. There's exactly 1 way to traverse each of these.
With all that in mind, we can write this dynamic program for the number of ways to climb n steps as a recursive expression:
W(n) = W(n - 1) + W(n - 2) if n > 1
1 n == 0, 1
Now you don't want to implement this as a simple recursive function. It will take time exponential in n to compute because each call to W calls itself twice. Yet most of those calls are unnecessary repeats. What to do?
One way to get the job done is find a way to compute the W(i) values in sequence. For a 1-valued DP it's usually quite simple, and so it is here:
W(0) = 1
W(1) = 1 (from the base case)
W(2) = W(1) + W(0) = 1 + 1 = 2
W(3) = W(2) + W(1) = 2 + 1 = 3
You get the idea. This is a very simple DP indeed. To compute W(i), we need only two previous values, W(i-1) and W(i-2). A simple O(n) loop will do the trick.
As a sanity check, look at W(3)=3. Indeed, to go up 3 steps, we can take hops of 1 then 2, 2 then 1, or three hops of 1. That's 3 ways!
Can't resist one more? W(4)=2+3=5. The hop sequences are (2,2), (2,1,1), (1,2,1), (1,1,2), and (1,1,1,1): 5 ways.
In fact, the chart above will look familiar to many. The number of ways to climb n steps is the (n+1)th Fibonacci number. You should code the loop yourself. If stuck, you can look up any of the hundreds of posted examples.
private static void printPath(int n,String path) {
if (n == 0) {
System.out.println(path.substring(1));
}
if (n-1 >=0) {
printPath(n-1,path + ",1");
}
if (n-2 >=0) {
printPath(n-2,path + ",2");
}
}
public static void main(String[] args) {
printPath(4,"");
}
Since it looks like a programming assignment, I will give you the steps to get you started instead of giving the actual code:
You can write a recursive function which keeps track of the step you are on.
If you have reached step n then you found one permutation or if you are past n then you should discard it.
At every step you can call the function by incrementing the steps by 1 and 2
The function will return the number of permutations possible once finished

Efficient Algorithm to find combination of numbers for an answer [duplicate]

I'm working on a homework problem that asks me this:
Tiven a finite set of numbers, and a target number, find if the set can be used to calculate the target number using basic math operations (add, sub, mult, div) and using each number in the set exactly once (so I need to exhaust the set). This has to be done with recursion.
So, for example, if I have the set
{1, 2, 3, 4}
and target 10, then I could get to it by using
((3 * 4) - 2)/1 = 10.
I'm trying to phrase the algorithm in pseudo-code, but so far haven't gotten too far. I'm thinking graphs are the way to go, but would definitely appreciate help on this. thanks.
This isn't meant to be the fastest solution, but rather an instructive one.
It recursively generates all equations in postfix notation
It also provides a translation from postfix to infix notation
There is no actual arithmetic calculation done, so you have to implement that on your own
Be careful about division by zero
With 4 operands, 4 possible operators, it generates all 7680 = 5 * 4! * 4^3
possible expressions.
5 is Catalan(3). Catalan(N) is the number of ways to paranthesize N+1 operands.
4! because the 4 operands are permutable
4^3 because the 3 operators each have 4 choice
This definitely does not scale well, as the number of expressions for N operands is [1, 8, 192, 7680, 430080, 30965760, 2724986880, ...].
In general, if you have n+1 operands, and must insert n operators chosen from k possibilities, then there are (2n)!/n! k^n possible equations.
Good luck!
import java.util.*;
public class Expressions {
static String operators = "+-/*";
static String translate(String postfix) {
Stack<String> expr = new Stack<String>();
Scanner sc = new Scanner(postfix);
while (sc.hasNext()) {
String t = sc.next();
if (operators.indexOf(t) == -1) {
expr.push(t);
} else {
expr.push("(" + expr.pop() + t + expr.pop() + ")");
}
}
return expr.pop();
}
static void brute(Integer[] numbers, int stackHeight, String eq) {
if (stackHeight >= 2) {
for (char op : operators.toCharArray()) {
brute(numbers, stackHeight - 1, eq + " " + op);
}
}
boolean allUsedUp = true;
for (int i = 0; i < numbers.length; i++) {
if (numbers[i] != null) {
allUsedUp = false;
Integer n = numbers[i];
numbers[i] = null;
brute(numbers, stackHeight + 1, eq + " " + n);
numbers[i] = n;
}
}
if (allUsedUp && stackHeight == 1) {
System.out.println(eq + " === " + translate(eq));
}
}
static void expression(Integer... numbers) {
brute(numbers, 0, "");
}
public static void main(String args[]) {
expression(1, 2, 3, 4);
}
}
Before thinking about how to solve the problem (like with graphs), it really helps to just look at the problem. If you find yourself stuck and can't seem to come up with any pseudo-code, then most likely there is something that is holding you back; Some other question or concern that hasn't been addressed yet. An example 'sticky' question in this case might be, "What exactly is recursive about this problem?"
Before you read the next paragraph, try to answer this question first. If you knew what was recursive about the problem, then writing a recursive method to solve it might not be very difficult.
You want to know if some expression that uses a set of numbers (each number used only once) gives you a target value. There are four binary operations, each with an inverse. So, in other words, you want to know if the first number operated with some expression of the other numbers gives you the target. Well, in other words, you want to know if some expression of the 'other' numbers is [...]. If not, then using the first operation with the first number doesn't really give you what you need, so try the other ops. If they don't work, then maybe it just wasn't meant to be.
Edit: I thought of this for an infix expression of four operators without parenthesis, since a comment on the original question said that parenthesis were added for the sake of an example (for clarity?) and the use of parenthesis was not explicitly stated.
Well, you didn't mention efficiency so I'm going to post a really brute force solution and let you optimize it if you want to. Since you can have parantheses, it's easy to brute force it using Reverse Polish Notation:
First of all, if your set has n numbers, you must use exactly n - 1 operators. So your solution will be given by a sequence of 2n - 1 symbols from {{your given set}, {*, /, +, -}}
st = a stack of length 2n - 1
n = numbers in your set
a = your set, to which you add *, /, +, -
v[i] = 1 if the NUMBER i has been used before, 0 otherwise
void go(int k)
{
if ( k > 2n - 1 )
{
// eval st as described on Wikipedia.
// Careful though, it might not be valid, so you'll have to check that it is
// if it evals to your target value great, you can build your target from the given numbers. Otherwise, go on.
return;
}
for ( each symbol x in a )
if ( x isn't a number or x is a number but v[x] isn't 1 )
{
st[k] = x;
if ( x is a number )
v[x] = 1;
go(k + 1);
}
}
Generally speaking, when you need to do something recursively it helps to start from the "bottom" and think your way up.
Consider: You have a set S of n numbers {a,b,c,...}, and a set of four operations {+,-,*,/}. Let's call your recursive function that operates on the set F(S)
If n is 1, then F(S) will just be that number.
If n is 2, F(S) can be eight things:
pick your left-hand number from S (2 choices)
then pick an operation to apply (4 choices)
your right-hand number will be whatever is left in the set
Now, you can generalize from the n=2 case:
Pick a number x from S to be the left-hand operand (n choices)
Pick an operation to apply
your right hand number will be F(S-x)
I'll let you take it from here. :)
edit: Mark poses a valid criticism; the above method won't get absolutely everything. To fix that problem, you need to think about it in a slightly different way:
At each step, you first pick an operation (4 choices), and then
partition S into two sets, for the left and right hand operands,
and recursively apply F to both partitions
Finding all partitions of a set into 2 parts isn't trivial itself, though.
Your best clue about how to approach this problem is the fact that your teacher/professor wants you to use recursion. That is, this isn't a math problem - it is a search problem.
Not to give too much away (it is homework after all), but you have to spawn a call to the recursive function using an operator, a number and a list containing the remaining numbers. The recursive function will extract a number from the list and, using the operation passed in, combine it with the number passed in (which is your running total). Take the running total and call yourself again with the remaining items on the list (you'll have to iterate the list within the call but the sequence of calls is depth-first). Do this once for each of the four operators unless Success has been achieved by a previous leg of the search.
I updated this to use a list instead of a stack
When the result of the operation is your target number and your list is empty, then you have successfully found the set of operations (those that traced the path to the successful leaf) - set the Success flag and unwind. Note that the operators aren't on a list nor are they in the call: the function itself always iterates over all four. Your mechanism for "unwinding" the operator sequence from the successful leaf to get the sequence is to return the current operator and number prepended to the value returned by recursive call (only one of which will be successful since you stop at success - that, obviously, is the one to use). If none are successful, then what you return isn't important anyhow.
Update This is much harder when you have to consider expressions like the one that Daniel posted. You have combinatorics on the numbers and the groupings (numbers due to the fact that / and - are order sensitive even without grouping and grouping because it changes precedence). Then, of course, you also have the combinatorics of the operations. It is harder to manage the differences between (4 + 3) * 2 and 4 + (3 * 2) because grouping doesn't recurse like operators or numbers (which you can just iterate over in a breadth-first manner while making your (depth-first) recursive calls).
Here's some Python code to get you started: it just prints all the possible expressions, without worrying too much about redundancy. You'd need to modify it to evaluate expressions and compare to the target number, rather than simply printing them.
The basic idea is: given a set S of numbers, partition S into two subsets left and right in all possible ways (where we don't care about the order or the elements in left and right), such that left and right are both nonempty. Now for each of these partitions, find all ways of combining the elements in left (recursively!), and similarly for right, and combine the two resulting values with all possible operators. The recursion bottoms out when a set has just one element, in which case there's only one value possible.
Even if you don't know Python, the expressions function should be reasonably easy to follow; the splittings function contains some Python oddities, but all it does is to find all the partitions of the list l into left and right pieces.
def splittings(l):
n = len(l)
for i in xrange(2**n):
left = [e for b, e in enumerate(l) if i & 2**b]
right = [e for b, e in enumerate(l) if not i & 2**b]
yield left, right
def expressions(l):
if len(l) == 1:
yield l[0]
else:
for left, right in splittings(l):
if not left or not right:
continue
for el in expressions(left):
for er in expressions(right):
for operator in '+-*/':
yield '(' + el + operator + er + ')'
for x in expressions('1234'):
print x
pusedo code:
Works(list, target)
for n in list
tmp=list.remove(n)
return Works(tmp,target+n) or Works(tmp,target-n) or Works(tmp, n-target) or ...
then you just have to put the base case in. I think I gave away to much.

Number of ways to add up to a sum S with N numbers

Say S = 5 and N = 3 the solutions would look like - <0,0,5> <0,1,4> <0,2,3> <0,3,2> <5,0,0> <2,3,0> <3,2,0> <1,2,2> etc etc.
In the general case, N nested loops can be used to solve the problem. Run N nested loop, inside them check if the loop variables add upto S.
If we do not know N ahead of time, we can use a recursive solution. In each level, run a loop starting from 0 to N, and then call the function itself again. When we reach a depth of N, see if the numbers obtained add up to S.
Any other dynamic programming solution?
Try this recursive function:
f(s, n) = 1 if s = 0
= 0 if s != 0 and n = 0
= sum f(s - i, n - 1) over i in [0, s] otherwise
To use dynamic programming you can cache the value of f after evaluating it, and check if the value already exists in the cache before evaluating it.
There is a closed form formula : binomial(s + n - 1, s) or binomial(s+n-1,n-1)
Those numbers are the simplex numbers.
If you want to compute them, use the log gamma function or arbitrary precision arithmetic.
See https://math.stackexchange.com/questions/2455/geometric-proof-of-the-formula-for-simplex-numbers
I have my own formula for this. We, together with my friend Gio made an investigative report concerning this. The formula that we got is [2 raised to (n-1) - 1], where n is the number we are looking for how many addends it has.
Let's try.
If n is 1: its addends are o. There's no two or more numbers that we can add to get a sum of 1 (excluding 0). Let's try a higher number.
Let's try 4. 4 has addends: 1+1+1+1, 1+2+1, 1+1+2, 2+1+1, 1+3, 2+2, 3+1. Its total is 7.
Let's check with the formula. 2 raised to (4-1) - 1 = 2 raised to (3) - 1 = 8-1 =7.
Let's try 15. 2 raised to (15-1) - 1 = 2 raised to (14) - 1 = 16384 - 1 = 16383. Therefore, there are 16383 ways to add numbers that will equal to 15.
(Note: Addends are positive numbers only.)
(You can try other numbers, to check whether our formula is correct or not.)
This can be calculated in O(s+n) (or O(1) if you don't mind an approximation) in the following way:
Imagine we have a string with n-1 X's in it and s o's. So for your example of s=5, n=3, one example string would be
oXooXoo
Notice that the X's divide the o's into three distinct groupings: one of length 1, length 2, and length 2. This corresponds to your solution of <1,2,2>. Every possible string gives us a different solution, by counting the number of o's in a row (a 0 is possible: for example, XoooooX would correspond to <0,5,0>). So by counting the number of possible strings of this form, we get the answer to your question.
There are s+(n-1) positions to choose for s o's, so the answer is Choose(s+n-1, s).
There is a fixed formula to find the answer. If you want to find the number of ways to get N as the sum of R elements. The answer is always:
(N+R-1)!/((R-1)!*(N)!)
or in other words:
(N+R-1) C (R-1)
This actually looks a lot like a Towers of Hanoi problem, without the constraint of stacking disks only on larger disks. You have S disks that can be in any combination on N towers. So that's what got me thinking about it.
What I suspect is that there is a formula we can deduce that doesn't require the recursive programming. I'll need a bit more time though.

Computing target number from numbers in a set

I'm working on a homework problem that asks me this:
Tiven a finite set of numbers, and a target number, find if the set can be used to calculate the target number using basic math operations (add, sub, mult, div) and using each number in the set exactly once (so I need to exhaust the set). This has to be done with recursion.
So, for example, if I have the set
{1, 2, 3, 4}
and target 10, then I could get to it by using
((3 * 4) - 2)/1 = 10.
I'm trying to phrase the algorithm in pseudo-code, but so far haven't gotten too far. I'm thinking graphs are the way to go, but would definitely appreciate help on this. thanks.
This isn't meant to be the fastest solution, but rather an instructive one.
It recursively generates all equations in postfix notation
It also provides a translation from postfix to infix notation
There is no actual arithmetic calculation done, so you have to implement that on your own
Be careful about division by zero
With 4 operands, 4 possible operators, it generates all 7680 = 5 * 4! * 4^3
possible expressions.
5 is Catalan(3). Catalan(N) is the number of ways to paranthesize N+1 operands.
4! because the 4 operands are permutable
4^3 because the 3 operators each have 4 choice
This definitely does not scale well, as the number of expressions for N operands is [1, 8, 192, 7680, 430080, 30965760, 2724986880, ...].
In general, if you have n+1 operands, and must insert n operators chosen from k possibilities, then there are (2n)!/n! k^n possible equations.
Good luck!
import java.util.*;
public class Expressions {
static String operators = "+-/*";
static String translate(String postfix) {
Stack<String> expr = new Stack<String>();
Scanner sc = new Scanner(postfix);
while (sc.hasNext()) {
String t = sc.next();
if (operators.indexOf(t) == -1) {
expr.push(t);
} else {
expr.push("(" + expr.pop() + t + expr.pop() + ")");
}
}
return expr.pop();
}
static void brute(Integer[] numbers, int stackHeight, String eq) {
if (stackHeight >= 2) {
for (char op : operators.toCharArray()) {
brute(numbers, stackHeight - 1, eq + " " + op);
}
}
boolean allUsedUp = true;
for (int i = 0; i < numbers.length; i++) {
if (numbers[i] != null) {
allUsedUp = false;
Integer n = numbers[i];
numbers[i] = null;
brute(numbers, stackHeight + 1, eq + " " + n);
numbers[i] = n;
}
}
if (allUsedUp && stackHeight == 1) {
System.out.println(eq + " === " + translate(eq));
}
}
static void expression(Integer... numbers) {
brute(numbers, 0, "");
}
public static void main(String args[]) {
expression(1, 2, 3, 4);
}
}
Before thinking about how to solve the problem (like with graphs), it really helps to just look at the problem. If you find yourself stuck and can't seem to come up with any pseudo-code, then most likely there is something that is holding you back; Some other question or concern that hasn't been addressed yet. An example 'sticky' question in this case might be, "What exactly is recursive about this problem?"
Before you read the next paragraph, try to answer this question first. If you knew what was recursive about the problem, then writing a recursive method to solve it might not be very difficult.
You want to know if some expression that uses a set of numbers (each number used only once) gives you a target value. There are four binary operations, each with an inverse. So, in other words, you want to know if the first number operated with some expression of the other numbers gives you the target. Well, in other words, you want to know if some expression of the 'other' numbers is [...]. If not, then using the first operation with the first number doesn't really give you what you need, so try the other ops. If they don't work, then maybe it just wasn't meant to be.
Edit: I thought of this for an infix expression of four operators without parenthesis, since a comment on the original question said that parenthesis were added for the sake of an example (for clarity?) and the use of parenthesis was not explicitly stated.
Well, you didn't mention efficiency so I'm going to post a really brute force solution and let you optimize it if you want to. Since you can have parantheses, it's easy to brute force it using Reverse Polish Notation:
First of all, if your set has n numbers, you must use exactly n - 1 operators. So your solution will be given by a sequence of 2n - 1 symbols from {{your given set}, {*, /, +, -}}
st = a stack of length 2n - 1
n = numbers in your set
a = your set, to which you add *, /, +, -
v[i] = 1 if the NUMBER i has been used before, 0 otherwise
void go(int k)
{
if ( k > 2n - 1 )
{
// eval st as described on Wikipedia.
// Careful though, it might not be valid, so you'll have to check that it is
// if it evals to your target value great, you can build your target from the given numbers. Otherwise, go on.
return;
}
for ( each symbol x in a )
if ( x isn't a number or x is a number but v[x] isn't 1 )
{
st[k] = x;
if ( x is a number )
v[x] = 1;
go(k + 1);
}
}
Generally speaking, when you need to do something recursively it helps to start from the "bottom" and think your way up.
Consider: You have a set S of n numbers {a,b,c,...}, and a set of four operations {+,-,*,/}. Let's call your recursive function that operates on the set F(S)
If n is 1, then F(S) will just be that number.
If n is 2, F(S) can be eight things:
pick your left-hand number from S (2 choices)
then pick an operation to apply (4 choices)
your right-hand number will be whatever is left in the set
Now, you can generalize from the n=2 case:
Pick a number x from S to be the left-hand operand (n choices)
Pick an operation to apply
your right hand number will be F(S-x)
I'll let you take it from here. :)
edit: Mark poses a valid criticism; the above method won't get absolutely everything. To fix that problem, you need to think about it in a slightly different way:
At each step, you first pick an operation (4 choices), and then
partition S into two sets, for the left and right hand operands,
and recursively apply F to both partitions
Finding all partitions of a set into 2 parts isn't trivial itself, though.
Your best clue about how to approach this problem is the fact that your teacher/professor wants you to use recursion. That is, this isn't a math problem - it is a search problem.
Not to give too much away (it is homework after all), but you have to spawn a call to the recursive function using an operator, a number and a list containing the remaining numbers. The recursive function will extract a number from the list and, using the operation passed in, combine it with the number passed in (which is your running total). Take the running total and call yourself again with the remaining items on the list (you'll have to iterate the list within the call but the sequence of calls is depth-first). Do this once for each of the four operators unless Success has been achieved by a previous leg of the search.
I updated this to use a list instead of a stack
When the result of the operation is your target number and your list is empty, then you have successfully found the set of operations (those that traced the path to the successful leaf) - set the Success flag and unwind. Note that the operators aren't on a list nor are they in the call: the function itself always iterates over all four. Your mechanism for "unwinding" the operator sequence from the successful leaf to get the sequence is to return the current operator and number prepended to the value returned by recursive call (only one of which will be successful since you stop at success - that, obviously, is the one to use). If none are successful, then what you return isn't important anyhow.
Update This is much harder when you have to consider expressions like the one that Daniel posted. You have combinatorics on the numbers and the groupings (numbers due to the fact that / and - are order sensitive even without grouping and grouping because it changes precedence). Then, of course, you also have the combinatorics of the operations. It is harder to manage the differences between (4 + 3) * 2 and 4 + (3 * 2) because grouping doesn't recurse like operators or numbers (which you can just iterate over in a breadth-first manner while making your (depth-first) recursive calls).
Here's some Python code to get you started: it just prints all the possible expressions, without worrying too much about redundancy. You'd need to modify it to evaluate expressions and compare to the target number, rather than simply printing them.
The basic idea is: given a set S of numbers, partition S into two subsets left and right in all possible ways (where we don't care about the order or the elements in left and right), such that left and right are both nonempty. Now for each of these partitions, find all ways of combining the elements in left (recursively!), and similarly for right, and combine the two resulting values with all possible operators. The recursion bottoms out when a set has just one element, in which case there's only one value possible.
Even if you don't know Python, the expressions function should be reasonably easy to follow; the splittings function contains some Python oddities, but all it does is to find all the partitions of the list l into left and right pieces.
def splittings(l):
n = len(l)
for i in xrange(2**n):
left = [e for b, e in enumerate(l) if i & 2**b]
right = [e for b, e in enumerate(l) if not i & 2**b]
yield left, right
def expressions(l):
if len(l) == 1:
yield l[0]
else:
for left, right in splittings(l):
if not left or not right:
continue
for el in expressions(left):
for er in expressions(right):
for operator in '+-*/':
yield '(' + el + operator + er + ')'
for x in expressions('1234'):
print x
pusedo code:
Works(list, target)
for n in list
tmp=list.remove(n)
return Works(tmp,target+n) or Works(tmp,target-n) or Works(tmp, n-target) or ...
then you just have to put the base case in. I think I gave away to much.

How can I randomly iterate through a large Range?

I would like to randomly iterate through a range. Each value will be visited only once and all values will eventually be visited. For example:
class Array
def shuffle
ret = dup
j = length
i = 0
while j > 1
r = i + rand(j)
ret[i], ret[r] = ret[r], ret[i]
i += 1
j -= 1
end
ret
end
end
(0..9).to_a.shuffle.each{|x| f(x)}
where f(x) is some function that operates on each value. A Fisher-Yates shuffle is used to efficiently provide random ordering.
My problem is that shuffle needs to operate on an array, which is not cool because I am working with astronomically large numbers. Ruby will quickly consume a large amount of RAM trying to create a monstrous array. Imagine replacing (0..9) with (0..99**99). This is also why the following code will not work:
tried = {} # store previous attempts
bigint = 99**99
bigint.times {
x = rand(bigint)
redo if tried[x]
tried[x] = true
f(x) # some function
}
This code is very naive and quickly runs out of memory as tried obtains more entries.
What sort of algorithm can accomplish what I am trying to do?
[Edit1]: Why do I want to do this? I'm trying to exhaust the search space of a hash algorithm for a N-length input string looking for partial collisions. Each number I generate is equivalent to a unique input string, entropy and all. Basically, I'm "counting" using a custom alphabet.
[Edit2]: This means that f(x) in the above examples is a method that generates a hash and compares it to a constant, target hash for partial collisions. I do not need to store the value of x after I call f(x) so memory should remain constant over time.
[Edit3/4/5/6]: Further clarification/fixes.
[Solution]: The following code is based on #bta's solution. For the sake of conciseness, next_prime is not shown. It produces acceptable randomness and only visits each number once. See the actual post for more details.
N = size_of_range
Q = ( 2 * N / (1 + Math.sqrt(5)) ).to_i.next_prime
START = rand(N)
x = START
nil until f( x = (x + Q) % N ) == START # assuming f(x) returns x
I just remembered a similar problem from a class I took years ago; that is, iterating (relatively) randomly through a set (completely exhausting it) given extremely tight memory constraints. If I'm remembering this correctly, our solution algorithm was something like this:
Define the range to be from 0 to
some number N
Generate a random starting point x[0] inside N
Generate an iterator Q less than N
Generate successive points x[n] by adding Q to
the previous point and wrapping around if needed. That
is, x[n+1] = (x[n] + Q) % N
Repeat until you generate a new point equal to the starting point.
The trick is to find an iterator that will let you traverse the entire range without generating the same value twice. If I'm remembering correctly, any relatively prime N and Q will work (the closer the number to the bounds of the range the less 'random' the input). In that case, a prime number that is not a factor of N should work. You can also swap bytes/nibbles in the resulting number to change the pattern with which the generated points "jump around" in N.
This algorithm only requires the starting point (x[0]), the current point (x[n]), the iterator value (Q), and the range limit (N) to be stored.
Perhaps someone else remembers this algorithm and can verify if I'm remembering it correctly?
As #Turtle answered, you problem doesn't have a solution. #KandadaBoggu and #bta solution gives you random numbers is some ranges which are or are not random. You get clusters of numbers.
But I don't know why you care about double occurence of the same number. If (0..99**99) is your range, then if you could generate 10^10 random numbers per second (if you have a 3 GHz processor and about 4 cores on which you generate one random number per CPU cycle - which is imposible, and ruby will even slow it down a lot), then it would take about 10^180 years to exhaust all the numbers. You have also probability about 10^-180 that two identical numbers will be generated during a whole year. Our universe has probably about 10^9 years, so if your computer could start calculation when the time began, then you would have probability about 10^-170 that two identical numbers were generated. In the other words - practicaly it is imposible and you don't have to care about it.
Even if you would use Jaguar (top 1 from www.top500.org supercomputers) with only this one task, you still need 10^174 years to get all numbers.
If you don't belive me, try
tried = {} # store previous attempts
bigint = 99**99
bigint.times {
x = rand(bigint)
puts "Oh, no!" if tried[x]
tried[x] = true
}
I'll buy you a beer if you will even once see "Oh, no!" on your screen during your life time :)
I could be wrong, but I don't think this is doable without storing some state. At the very least, you're going to need some state.
Even if you only use one bit per value (has this value been tried yes or no) then you will need X/8 bytes of memory to store the result (where X is the largest number). Assuming that you have 2GB of free memory, this would leave you with more than 16 million numbers.
Break the range in to manageable batches as shown below:
def range_walker range, batch_size = 100
size = (range.end - range.begin) + 1
n = size/batch_size
n.times do |i|
x = i * batch_size + range.begin
y = x + batch_size
(x...y).sort_by{rand}.each{|z| p z}
end
d = (range.end - size%batch_size + 1)
(d..range.end).sort_by{rand}.each{|z| p z }
end
You can further randomize solution by randomly choosing the batch for processing.
PS: This is a good problem for map-reduce. Each batch can be worked by independent nodes.
Reference:
Map-reduce in Ruby
you can randomly iterate an array with shuffle method
a = [1,2,3,4,5,6,7,8,9]
a.shuffle!
=> [5, 2, 8, 7, 3, 1, 6, 4, 9]
You want what's called a "full cycle iterator"...
Here is psudocode for the simplest version which is perfect for most uses...
function fullCycleStep(sample_size, last_value, random_seed = 31337, prime_number = 32452843) {
if last_value = null then last_value = random_seed % sample_size
return (last_value + prime_number) % sample_size
}
If you call this like so:
sample = 10
For i = 1 to sample
last_value = fullCycleStep(sample, last_value)
print last_value
next
It would generate random numbers, looping through all 10, never repeating If you change random_seed, which can be anything, or prime_number, which must be greater than, and not be evenly divisible by sample_size, you will get a new random order, but you will still never get a duplicate.
Database systems and other large-scale systems do this by writing the intermediate results of recursive sorts to a temp database file. That way, they can sort massive numbers of records while only keeping limited numbers of records in memory at any one time. This tends to be complicated in practice.
How "random" does your order have to be? If you don't need a specific input distribution, you could try a recursive scheme like this to minimize memory usage:
def gen_random_indices
# Assume your input range is (0..(10**3))
(0..3).sort_by{rand}.each do |a|
(0..3).sort_by{rand}.each do |b|
(0..3).sort_by{rand}.each do |c|
yield "#{a}#{b}#{c}".to_i
end
end
end
end
gen_random_indices do |idx|
run_test_with_index(idx)
end
Essentially, you are constructing the index by randomly generating one digit at a time. In the worst-case scenario, this will require enough memory to store 10 * (number of digits). You will encounter every number in the range (0..(10**3)) exactly once, but the order is only pseudo-random. That is, if the first loop sets a=1, then you will encounter all three-digit numbers of the form 1xx before you see the hundreds digit change.
The other downside is the need to manually construct the function to a specified depth. In your (0..(99**99)) case, this would likely be a problem (although I suppose you could write a script to generate the code for you). I'm sure there's probably a way to re-write this in a state-ful, recursive manner, but I can't think of it off the top of my head (ideas, anyone?).
[Edit]: Taking into account #klew and #Turtle's answers, the best I can hope for is batches of random (or close to random) numbers.
This is a recursive implementation of something similar to KandadaBoggu's solution. Basically, the search space (as a range) is partitioned into an array containing N equal-sized ranges. Each range is fed back in a random order as a new search space. This continues until the size of the range hits a lower bound. At this point the range is small enough to be converted into an array, shuffled, and checked.
Even though it is recursive, I haven't blown the stack yet. Instead, it errors out when attempting to partition a search space larger than about 10^19 keys. I has to do with the numbers being too large to convert to a long. It can probably be fixed:
# partition a range into an array of N equal-sized ranges
def partition(range, n)
ranges = []
first = range.first
last = range.last
length = last - first + 1
step = length / n # integer division
((first + step - 1)..last).step(step) { |i|
ranges << (first..i)
first = i + 1
}
# append any extra onto the last element
ranges[-1] = (ranges[-1].first)..last if last > step * ranges.length
ranges
end
I hope the code comments help shed some light on my original question.
pastebin: full source
Note: PW_LEN under # options can be changed to a lower number in order to get quicker results.
For a prohibitively large space, like
space = -10..1000000000000000000000
You can add this method to Range.
class Range
M127 = 170_141_183_460_469_231_731_687_303_715_884_105_727
def each_random(seed = 0)
return to_enum(__method__) { size } unless block_given?
unless first.kind_of? Integer
raise TypeError, "can't randomly iterate from #{first.class}"
end
sample_size = self.end - first + 1
sample_size -= 1 if exclude_end?
j = coprime sample_size
v = seed % sample_size
each do
v = (v + j) % sample_size
yield first + v
end
end
protected
def gcd(a,b)
b == 0 ? a : gcd(b, a % b)
end
def coprime(a, z = M127)
gcd(a, z) == 1 ? z : coprime(a, z + 1)
end
end
You could then
space.each_random { |i| puts i }
729815750697818944176
459631501395637888351
189447252093456832526
919263002791275776712
649078753489094720887
378894504186913665062
108710254884732609237
838526005582551553423
568341756280370497598
298157506978189441773
27973257676008385948
757789008373827330134
487604759071646274309
217420509769465218484
947236260467284162670
677052011165103106845
406867761862922051020
136683512560740995195
866499263258559939381
596315013956378883556
326130764654197827731
55946515352016771906
785762266049835716092
515578016747654660267
...
With a good amount of randomness so long as your space is a few orders smaller than M127.
Credit to #nick-steele and #bta for the approach.
This isn't really a Ruby-specific answer but I hope it's permitted. Andrew Kensler gives a C++ "permute()" function that does exactly this in his "Correlated Multi-Jittered Sampling" report.
As I understand it, the exact function he provides really only works if your "array" is up to size 2^27, but the general idea could be used for arrays of any size.
I'll do my best to sort of explain it. The first part is you need a hash that is reversible "for any power-of-two sized domain". Consider x = i + 1. No matter what x is, even if your integer overflows, you can determine what i was. More specifically, you can always determine the bottom n-bits of i from the bottom n-bits of x. Addition is a reversible hash operation, as is multiplication by an odd number, as is doing a bitwise xor by a constant. If you know a specific power-of-two domain, you can scramble bits in that domain. E.g. x ^= (x & 0xFF) >> 5) is valid for the 16-bit domain. You can specify that domain with a mask, e.g. mask = 0xFF, and your hash function becomes x = hash(i, mask). Of course you can add a "seed" value into that hash function to get different randomizations. Kensler lays out more valid operations in the paper.
So you have a reversible function x = hash(i, mask, seed). The problem is that if you hash your index, you might end up with a value that is larger than your array size, i.e. your "domain". You can't just modulo this or you'll get collisions.
The reversible hash is the key to using a technique called "cycle walking", introduced in "Ciphers with Arbitrary Finite Domains". Because the hash is reversible (i.e. 1-to-1), you can just repeatedly apply the same hash until your hashed value is smaller than your array! Because you're applying the same hash, and the mapping is one-to-one, whatever value you end up on will map back to exactly one index, so you don't have collisions. So your function could look something like this for 32-bit integers (pseudocode):
fun permute(i, length, seed) {
i = hash(i, 0xFFFF, seed)
while(i >= length): i = hash(i, 0xFFFF, seed)
return i
}
It could take a lot of hashes to get to your domain, so Kensler does a simple trick: he keeps the hash within the domain of the next power of two, which makes it require very few iterations (~2 on average), by masking out the unnecessary bits. The final algorithm looks like this:
fun next_pow_2(length) {
# This implementation is for clarity.
# See Kensler's paper for one way to do it fast.
p = 1
while (p < length): p *= 2
return p
}
permute(i, length, seed) {
mask = next_pow_2(length)-1
i = hash(i, mask, seed) & mask
while(i >= length): i = hash(i, mask, seed) & mask
return i
}
And that's it! Obviously the important thing here is choosing a good hash function, which Kensler provides in the paper but I wanted to break down the explanation. If you want to have different random permutations each time, you can add a "seed" value to the permute function which then gets passed to the hash function.

Resources