how to use Linq to generate unique random number - linq

here is my Linq code to generate a list of random numbers which contains 10 numbers ranging from 0 to 20
Random rand = new Random();
var randomSeq = Enumerable.Repeat(0, 10).Select(i => rand.Next(0,20));
Result:
6
19
18
7
18
12
12
9
2
18
as you can see i have three 18s and two 12s..
I have tried to use Distinct() function, but it will not fill up the list (e.g only fill up 8 out of 10 numbers)
Question: How can I generate unique number (i.e non repeatable numbers )
Many thanks

You want to generate a random permutation of the numbers 0 to 19 and pick 10 of these numbers. The standard algorithm for generating a random permutation is Fisher-Yates shuffle. After generating a random permutation you can just pick the first 10 numbers.
It is not to hard to come up with an ad-hoc algorithm like repeatedly choosing a new number if a collision occured but they usually fail to have good statistical properties, have nondeterministic runtime or don't even guarantee termination in the worst case.
Note that this solution is no good choice if the numbers are of different order. Generating a permuation of the numbers below a million only to pick ten is not the smartest thing one can do.
UPDATE
I just realized that you can just stop the algorithm after generating the first ten elements of the permutation - there is no need to build the whole permutation.

In functional programming it is usual to create infinite sequences. It might sound a little bizarre at first but it can be very usefull at some situations. Supose you have an extention as such:
public static class EnumerableExtentions
{
public static IEnumerable<T> Infinite<T>(Func<int, T> generator)
{
int count = 0;
checked {
while (true)
yield return generator(count++);
}
}
}
I can use it to create infinite sequences like:
var oddNumbers = EnumerableExtentions.Infinite(n => 2*n + 1)
That is an infinite sequence of all odd numbers. I could take only the first 10, for example:
oddNumbers.Take(10);
would yield:
1 3 5 7 9 11 13 15 17 19
Because of the defered execution, we don´t get a StackOverflowException (you gotta be carefull though).
The same principle can be used to create an infinite random sequence, distinct it and then taking the first 10:
var r = new Random();
var randomNumbers = EnumerableExtentions
.Infinite(i=> r.Next (0, 20))
.Distinct()
.Take(10);
If you need, you can make an OrderBy(s=>s) at the end.

At LINQ exchange, they discuss a method of randomly reordering a list with LINQ and give a code example which will generate a random permutation of the numbers you want.
They say (paraphrasing, and adapted for this problem):
Randomly Sort a List Array With LINQ OrderBy
// create and populate the original list with 20 elements
List<int> MyList = new List<int>(20);
for (int i = 0; i < 20; i++)
MyList.Add(i);
// use System.GUID to generate a new GUID for each item in the list
List<int> RandomList = MyList.OrderBy(x => System.Guid.NewGuid()).ToList();
LINQ OrderBy will then sort the array by the list of GUID's returned.
Now you can just take the first 10 elements of the list, and you've got your solution.
They note that using the System.Guid.NewGuid() yields the same distribution spread as the Fisher-Yates shuffle algorithm, and this way you won't have to actually implement the algorithm yourself.

Why not do:
Enumerable.Range(0, 20)
.OrderBy(x => Guid.NewGuid().GetHashCode())
.Distinct()
.Take(10)
.ToArray();

How about using a utility enumerable method:
static IEnumerable<int> RandomNumbersBetween(int min, int max)
{
int availableNumbers = (max - min) + 1 ;
int yieldedNumbers = 0;
Random rand = new Random();
Dictionary<int, object> used = new Dictionary<int, object>();
while (true)
{
int n = rand.Next(min, max+1); //Random.Next max value is exclusive, so add one
if (!used.ContainsKey(n))
{
yield return n;
used.Add(n, null);
if (++yieldedNumbers == availableNumbers)
yield break;
}
}
}
Because it returns IEnumerable, you can use it with LINQ and IEnumerable extension methods:
RandomNumbersBetween(0, 20).Take(10)
Or maybe take odd numbers only:
RandomNumbersBetween(1, 1000).Where(i => i%2 == 1).Take(100)
Et cetera.
Edit:
Note that this solution has terrible performance characteristics if you are trying to generate a full set of random numbers between min and max.
However it works efficiently if you want to generate, say 10 random numbers between 0 and 20, or even better, between 0 and 1000.
In worst-case scenario it can also take (max - min) space.

Just create a list of sequential valid numbers. Then generate a random index from this list and return (and remove from list) the number at the index.
static class Excensions
{
public static T PopAt<T>(this List<T> list, int index)
{
T ret = list[index];
list.RemoveAt(index);
return ret;
}
}
class Program
{
static void Main()
{
Random rng = new Random();
int length = 10; //sequence length
int limit = 20; //maximum value
var avail = Enumerable.Range(0, limit).ToList();
var seq = from i in Enumerable.Range(0, length)
select avail.PopAt(rng.Next(avail.Count));
}
}

store the generated result in an array, so anytime you generate e new number check if it has been generated before, if yes generate another one, otherwise take the number and save it in the array

Using a custom RepeatUntil extension and relying on closures:
using System;
using System.Collections.Generic;
using System.Linq;
namespace ConsoleApplication1
{
public static class CoolExtensions
{
public static IEnumerable<TResult> RepeatUntil<TResult>( TResult element, Func<bool> condition )
{
while (!condition())
yield return element;
}
}
class Program
{
static void Main( string[] args )
{
Random rand = new Random();
HashSet<int> numbers = new HashSet<int>();
var randomSeq = CoolExtensions.RepeatUntil( 0, () => numbers.Count >= 10).Select( i => rand.Next( 0, 20 ) ).Select( x => numbers.Add(x));
// just used to evaluate the sequence
randomSeq.ToList();
foreach (int number in numbers)
Console.WriteLine( number );
Console.ReadLine();
}
}
}

Why not order by a random? like this
var rnd = new Random();
var randomSeq = Enumerable.Range(1,20).OrderBy(r => rnd.NextDouble()).Take(10).ToList();

Can you do something like this?
Random rand = new Random();
var randomSeq = Enumerable.Range(0, 20).OrderBy(i => rand.Next(0,20)).Take(10);

Related

Get percentage unique numbers generated by random generator

I am using random generator in my python code. I want to get the percentage of unique random numbers generated over a huge range like from random(0:10^8).I need to generate 10^12 numbers What could be the efficient algorithm in terms of space complexity?
the code is similar to :
import random
dif = {}
for i in range(0,1000):
rannum = random.randint(0,50)
dif[rannum] = "True"
dif_len = len(dif)
print dif_len
per = float(dif_len)/50
print per
You have to keep track of each number the generator generates or there is no way to know whether some new number has been seen before. What is the best way to do that? It depends on how many numbers you are going to examine. For small N, use a HashSet. At some large number of N it becomes more efficient to use a bitmap.
For small N...
public class Accumulator {
private int uniqueNumbers = 0;
private int totalAccumulated = 0;
private HashSet<int> set = new HashSet<int>();
public void Add(int i) {
if (!set.Contains(i)) {
set.Add(i);
uniqueNumbers++;
}
totalAccumulated++;
}
public double PercentUnique() {
return 100.0 * uniqueNumbers / totalAccumulated;
}
}

Algorithm for finding random and unique number in a range [duplicate]

This question already has answers here:
Unique (non-repeating) random numbers in O(1)?
(22 answers)
Closed 8 years ago.
I want to know an algorithm to find unique random number which is non repeatable. Every time when I call that in program should be give a unique and random number which is not given before by that algorithm. I want to know because some time in a game or app this kind of requirements are came.
For ex. In a game I have created some objects and save all them in a array, and want to retrieve them by randomly and uniquely and not want to delete from array. This is just a scenario.
I have tried some alternative but they are not good performance wise, never got answer of this question.
How it is possible programmatically?
Thanks in advance.
Below code generates unique random numbers from 1-15. Modify as per your requirement:-
public class Main
{
int i[]= new int[15];
int x=0;
int counter;
public int getNumber()
{
return (int)((Math.random()*15)+1);
}
public int getU()
{
x = getNumber();
while(check(x))
{
x = getNumber();
}
i[counter]=x;
counter++;
return x;
}
public boolean check(int x)
{
boolean temp = false;
for(int n=0;n<=counter;n++)
{
if(i[n]==x)
{
temp = true;
break;
}
else
{
temp = false;
}
}
return temp;
}
public static void main(String args[])
{
Main obj = new Main();
for(int i=0;i!=15;i++)
{
System.out.println(obj.getU());
}
}
}
for more info see below links :-
https://community.oracle.com/message/4860317
Expand a random range from 1–5 to 1–7
The best option seems to me is to remove the returned number from the input list.
Let me explain:
Start with the whole range, for example: range = [0, 1, 2, 3, 4]
Toss a random index, let's say 3.
Now remove range[3] from range, you get range = [0, 1, 3, 4]
And so on.
Here is an example code in python:
import random
rangeStart = 0
rangeEnd = 10
rangeForExample = range(rangeStart, rangeEnd)
randomIndex = random.randrange(rangeStart, rangeEnd)
randomResult = rangeForExample[randomIndex]
rangeForExample.remove(randomResult)
This can be achieved in many ways. Here are the two of them(currently on top of my head) :
Persisting the previously generated values.(for range based random no. generation)
In this method you generate a random number and store it(either on file or db) so that when you generate next no. you can match it with the previous numbers and discard it if its already generated.
Generating a unique number every-time. (for non-range based random no. generation)
In this method you use a series or something like that which can give you unique number, current-time-millisecs for instance.
Get count of your array.
Random an index between (0, count).
Retrieve item of index in array.
Remove that item at index.
As I see that you do iOS, I would give an example in objective-C.
NSMutableArray *array = <creation of your array>;
int count = array.count;
while (1) {
int randomIndex = arc4random() % count;
id object = [array objectAtIndex:randomIndex];
NSLog(#"Random object: %#", object);
[array removeObject:object];
count--; // This is important
if(array.count == 0)
{
return;
}
}
Here are two options I could think of ..
Using a history-list
1. Keep past picked random numbers in a list
2. Find a new random number
3. If the number exist in history list, go to 2
4. [optional] If the number lower history list randomness, go to 2
5. add the number to the history list
Using jumps
At Time 0: i=0; seed(Time); R0 = random() % jump_limit
1. i++
2. Ji = random() % jump_limit
3. Ri = Ri-1 + Ji

Printing numbers of the form 2^i * 5^j in increasing order

How do you print numbers of form 2^i * 5^j in increasing order.
For eg:
1, 2, 4, 5, 8, 10, 16, 20
This is actually a very interesting question, especially if you don't want this to be N^2 or NlogN complexity.
What I would do is the following:
Define a data structure containing 2 values (i and j) and the result of the formula.
Define a collection (e.g. std::vector) containing this data structures
Initialize the collection with the value (0,0) (the result is 1 in this case)
Now in a loop do the following:
Look in the collection and take the instance with the smallest value
Remove it from the collection
Print this out
Create 2 new instances based on the instance you just processed
In the first instance increment i
In the second instance increment j
Add both instances to the collection (if they aren't in the collection yet)
Loop until you had enough of it
The performance can be easily tweaked by choosing the right data structure and collection.
E.g. in C++, you could use an std::map, where the key is the result of the formula, and the value is the pair (i,j). Taking the smallest value is then just taking the first instance in the map (*map.begin()).
I quickly wrote the following application to illustrate it (it works!, but contains no further comments, sorry):
#include <math.h>
#include <map>
#include <iostream>
typedef __int64 Integer;
typedef std::pair<Integer,Integer> MyPair;
typedef std::map<Integer,MyPair> MyMap;
Integer result(const MyPair &myPair)
{
return pow((double)2,(double)myPair.first) * pow((double)5,(double)myPair.second);
}
int main()
{
MyMap myMap;
MyPair firstValue(0,0);
myMap[result(firstValue)] = firstValue;
while (true)
{
auto it=myMap.begin();
if (it->first < 0) break; // overflow
MyPair myPair = it->second;
std::cout << it->first << "= 2^" << myPair.first << "*5^" << myPair.second << std::endl;
myMap.erase(it);
MyPair pair1 = myPair;
++pair1.first;
myMap[result(pair1)] = pair1;
MyPair pair2 = myPair;
++pair2.second;
myMap[result(pair2)] = pair2;
}
}
This is well suited to a functional programming style. In F#:
let min (a,b)= if(a<b)then a else b;;
type stream (current, next)=
member this.current = current
member this.next():stream = next();;
let rec merge(a:stream,b:stream)=
if(a.current<b.current) then new stream(a.current, fun()->merge(a.next(),b))
else new stream(b.current, fun()->merge(a,b.next()));;
let rec Squares(start) = new stream(start,fun()->Squares(start*2));;
let rec AllPowers(start) = new stream(start,fun()->merge(Squares(start*2),AllPowers(start*5)));;
let Results = AllPowers(1);;
Works well with Results then being a stream type with current value and a next method.
Walking through it:
I define min for completenes.
I define a stream type to have a current value and a method to return a new string, essentially head and tail of a stream of numbers.
I define the function merge, which takes the smaller of the current values of two streams and then increments that stream. It then recurses to provide the rest of the stream. Essentially, given two streams which are in order, it will produce a new stream which is in order.
I define squares to be a stream increasing in powers of 2.
AllPowers takes the start value and merges the stream resulting from all squares at this number of powers of 5. it with the stream resulting from multiplying it by 5, since these are your only two options. You effectively are left with a tree of results
The result is merging more and more streams, so you merge the following streams
1, 2, 4, 8, 16, 32...
5, 10, 20, 40, 80, 160...
25, 50, 100, 200, 400...
.
.
.
Merging all of these turns out to be fairly efficient with tail recursio and compiler optimisations etc.
These could be printed to the console like this:
let rec PrintAll(s:stream)=
if (s.current > 0) then
do System.Console.WriteLine(s.current)
PrintAll(s.next());;
PrintAll(Results);
let v = System.Console.ReadLine();
Similar things could be done in any language which allows for recursion and passing functions as values (it's only a little more complex if you can't pass functions as variables).
For an O(N) solution, you can use a list of numbers found so far and two indexes: one representing the next number to be multiplied by 2, and the other the next number to be multiplied by 5. Then in each iteration you have two candidate values to choose the smaller one from.
In Python:
numbers = [1]
next_2 = 0
next_5 = 0
for i in xrange(100):
mult_2 = numbers[next_2]*2
mult_5 = numbers[next_5]*5
if mult_2 < mult_5:
next = mult_2
next_2 += 1
else:
next = mult_5
next_5 += 1
# The comparison here is to avoid appending duplicates
if next > numbers[-1]:
numbers.append(next)
print numbers
So we have two loops, one incrementing i and second one incrementing j starting both from zero, right? (multiply symbol is confusing in the title of the question)
You can do something very straightforward:
Add all items in an array
Sort the array
Or you need an other solution with more math analysys?
EDIT: More smart solution by leveraging similarity with Merge Sort problem
If we imagine infinite set of numbers of 2^i and 5^j as two independent streams/lists this problem looks very the same as well known Merge Sort problem.
So solution steps are:
Get two numbers one from the each of streams (of 2 and of 5)
Compare
Return smallest
get next number from the stream of the previously returned smallest
and that's it! ;)
PS: Complexity of Merge Sort always is O(n*log(n))
I visualize this problem as a matrix M where M(i,j) = 2^i * 5^j. This means that both the rows and columns are increasing.
Think about drawing a line through the entries in increasing order, clearly beginning at entry (1,1). As you visit entries, the row and column increasing conditions ensure that the shape formed by those cells will always be an integer partition (in English notation). Keep track of this partition (mu = (m1, m2, m3, ...) where mi is the number of smaller entries in row i -- hence m1 >= m2 >= ...). Then the only entries that you need to compare are those entries which can be added to the partition.
Here's a crude example. Suppose you've visited all the xs (mu = (5,3,3,1)), then you need only check the #s:
x x x x x #
x x x #
x x x
x #
#
Therefore the number of checks is the number of addable cells (equivalently the number of ways to go up in Bruhat order if you're of a mind to think in terms of posets).
Given a partition mu, it's easy to determine what the addable states are. Image an infinite string of 0s following the last positive entry. Then you can increase mi by 1 if and only if m(i-1) > mi.
Back to the example, for mu = (5,3,3,1) we can increase m1 (6,3,3,1) or m2 (5,4,3,1) or m4 (5,3,3,2) or m5 (5,3,3,1,1).
The solution to the problem then finds the correct sequence of partitions (saturated chain). In pseudocode:
mu = [1,0,0,...,0];
while (/* some terminate condition or go on forever */) {
minNext = 0;
nextCell = [];
// look through all addable cells
for (int i=0; i<mu.length; ++i) {
if (i==0 or mu[i-1]>mu[i]) {
// check for new minimum value
if (minNext == 0 or 2^i * 5^(mu[i]+1) < minNext) {
nextCell = i;
minNext = 2^i * 5^(mu[i]+1)
}
}
}
// print next largest entry and update mu
print(minNext);
mu[i]++;
}
I wrote this in Maple stopping after 12 iterations:
1, 2, 4, 5, 8, 10, 16, 20, 25, 32, 40, 50
and the outputted sequence of cells added and got this:
1 2 3 5 7 10
4 6 8 11
9 12
corresponding to this matrix representation:
1, 2, 4, 8, 16, 32...
5, 10, 20, 40, 80, 160...
25, 50, 100, 200, 400...
First of all, (as others mentioned already) this question is very vague!!!
Nevertheless, I am going to give a shot based on your vague equation and the pattern as your expected result. So I am not sure the following will be true for what you are trying to do, however it may give you some idea about java collections!
import java.util.List;
import java.util.ArrayList;
import java.util.SortedSet;
import java.util.TreeSet;
public class IncreasingNumbers {
private static List<Integer> findIncreasingNumbers(int maxIteration) {
SortedSet<Integer> numbers = new TreeSet<Integer>();
SortedSet<Integer> numbers2 = new TreeSet<Integer>();
for (int i=0;i < maxIteration;i++) {
int n1 = (int)Math.pow(2, i);
numbers.add(n1);
for (int j=0;j < maxIteration;j++) {
int n2 = (int)Math.pow(5, i);
numbers.add(n2);
for (Integer n: numbers) {
int n3 = n*n1;
numbers2.add(n3);
}
}
}
numbers.addAll(numbers2);
return new ArrayList<Integer>(numbers);
}
/**
* Based on the following fuzzy question # StackOverflow
* http://stackoverflow.com/questions/7571934/printing-numbers-of-the-form-2i-5j-in-increasing-order
*
*
* Result:
* 1 2 4 5 8 10 16 20 25 32 40 64 80 100 125 128 200 256 400 625 1000 2000 10000
*/
public static void main(String[] args) {
List<Integer> numbers = findIncreasingNumbers(5);
for (Integer i: numbers) {
System.out.print(i + " ");
}
}
}
If you can do it in O(nlogn), here's a simple solution:
Get an empty min-heap
Put 1 in the heap
while (you want to continue)
Get num from heap
print num
put num*2 and num*5 in the heap
There you have it. By min-heap, I mean min-heap
As a mathematician the first thing I always think about when looking at something like this is "will logarithms help?".
In this case it might.
If our series A is increasing then the series log(A) is also increasing. Since all terms of A are of the form 2^i.5^j then all members of the series log(A) are of the form i.log(2) + j.log(5)
We can then look at the series log(A)/log(2) which is also increasing and its elements are of the form i+j.(log(5)/log(2))
If we work out the i and j that generates the full ordered list for this last series (call it B) then that i and j will also generate the series A correctly.
This is just changing the nature of the problem but hopefully to one where it becomes easier to solve. At each step you can either increase i and decrease j or vice versa.
Looking at a few of the early changes you can make (which I will possibly refer to as transforms of i,j or just transorms) gives us some clues of where we are going.
Clearly increasing i by 1 will increase B by 1. However, given that log(5)/log(2) is approx 2.3 then increasing j by 1 while decreasing i by 2 will given an increase of just 0.3 . The problem then is at each stage finding the minimum possible increase in B for changes of i and j.
To do this I just kept a record as I increased of the most efficient transforms of i and j (ie what to add and subtract from each) to get the smallest possible increase in the series. Then applied whichever one was valid (ie making sure i and j don't go negative).
Since at each stage you can either decrease i or decrease j there are effectively two classes of transforms that can be checked individually. A new transform doesn't have to have the best overall score to be included in our future checks, just better than any other in its class.
To test my thougths I wrote a sort of program in LinqPad. Key things to note are that the Dump() method just outputs the object to screen and that the syntax/structure isn't valid for a real c# file. Converting it if you want to run it should be easy though.
Hopefully anything not explicitly explained will be understandable from the code.
void Main()
{
double C = Math.Log(5)/Math.Log(2);
int i = 0;
int j = 0;
int maxi = i;
int maxj = j;
List<int> outputList = new List<int>();
List<Transform> transforms = new List<Transform>();
outputList.Add(1);
while (outputList.Count<500)
{
Transform tr;
if (i==maxi)
{
//We haven't considered i this big before. Lets see if we can find an efficient transform by getting this many i and taking away some j.
maxi++;
tr = new Transform(maxi, (int)(-(maxi-maxi%C)/C), maxi%C);
AddIfWorthwhile(transforms, tr);
}
if (j==maxj)
{
//We haven't considered j this big before. Lets see if we can find an efficient transform by getting this many j and taking away some i.
maxj++;
tr = new Transform((int)(-(maxj*C)), maxj, (maxj*C)%1);
AddIfWorthwhile(transforms, tr);
}
//We have a set of transforms. We first find ones that are valid then order them by score and take the first (smallest) one.
Transform bestTransform = transforms.Where(x=>x.I>=-i && x.J >=-j).OrderBy(x=>x.Score).First();
//Apply transform
i+=bestTransform.I;
j+=bestTransform.J;
//output the next number in out list.
int value = GetValue(i,j);
//This line just gets it to stop when it overflows. I would have expected an exception but maybe LinqPad does magic with them?
if (value<0) break;
outputList.Add(value);
}
outputList.Dump();
}
public int GetValue(int i, int j)
{
return (int)(Math.Pow(2,i)*Math.Pow(5,j));
}
public void AddIfWorthwhile(List<Transform> list, Transform tr)
{
if (list.Where(x=>(x.Score<tr.Score && x.IncreaseI == tr.IncreaseI)).Count()==0)
{
list.Add(tr);
}
}
// Define other methods and classes here
public class Transform
{
public int I;
public int J;
public double Score;
public bool IncreaseI
{
get {return I>0;}
}
public Transform(int i, int j, double score)
{
I=i;
J=j;
Score=score;
}
}
I've not bothered looking at the efficiency of this but I strongly suspect its better than some other solutions because at each stage all I need to do is check my set of transforms - working out how many of these there are compared to "n" is non-trivial. It is clearly related since the further you go the more transforms there are but the number of new transforms becomes vanishingly small at higher numbers so maybe its just O(1). This O stuff always confused me though. ;-)
One advantage over other solutions is that it allows you to calculate i,j without needing to calculate the product allowing me to work out what the sequence would be without needing to calculate the actual number itself.
For what its worth after the first 230 nunmbers (when int runs out of space) I had 9 transforms to check each time. And given its only my total that overflowed I ran if for the first million results and got to i=5191 and j=354. The number of transforms was 23. The size of this number in the list is approximately 10^1810. Runtime to get to this level was approx 5 seconds.
P.S. If you like this answer please feel free to tell your friends since I spent ages on this and a few +1s would be nice compensation. Or in fact just comment to tell me what you think. :)
I'm sure everyone one's might have got the answer by now, but just wanted to give a direction to this solution..
It's a Ctrl C + Ctrl V from
http://www.careercup.com/question?id=16378662
void print(int N)
{
int arr[N];
arr[0] = 1;
int i = 0, j = 0, k = 1;
int numJ, numI;
int num;
for(int count = 1; count < N; )
{
numI = arr[i] * 2;
numJ = arr[j] * 5;
if(numI < numJ)
{
num = numI;
i++;
}
else
{
num = numJ;
j++;
}
if(num > arr[k-1])
{
arr[k] = num;
k++;
count++;
}
}
for(int counter = 0; counter < N; counter++)
{
printf("%d ", arr[counter]);
}
}
The question as put to me was to return an infinite set of solutions. I pondered the use of trees, but felt there was a problem with figuring out when to harvest and prune the tree, given an infinite number of values for i & j. I realized that a sieve algorithm could be used. Starting from zero, determine whether each positive integer had values for i and j. This was facilitated by turning answer = (2^i)*(2^j) around and solving for i instead. That gave me i = log2 (answer/ (5^j)). Here is the code:
class Program
{
static void Main(string[] args)
{
var startTime = DateTime.Now;
int potential = 0;
do
{
if (ExistsIandJ(potential))
Console.WriteLine("{0}", potential);
potential++;
} while (potential < 100000);
Console.WriteLine("Took {0} seconds", DateTime.Now.Subtract(startTime).TotalSeconds);
}
private static bool ExistsIandJ(int potential)
{
// potential = (2^i)*(5^j)
// 1 = (2^i)*(5^j)/potential
// 1/(2^1) = (5^j)/potential or (2^i) = potential / (5^j)
// i = log2 (potential / (5^j))
for (var j = 0; Math.Pow(5,j) <= potential; j++)
{
var i = Math.Log(potential / Math.Pow(5, j), 2);
if (i == Math.Truncate(i))
return true;
}
return false;
}
}

LINQ implementation of Cartesian Product with pruning

I hope someone is able to help me with what is, at least to me, quite a tricky algorithm.
The Problem
I have a List (1 <= size <= 5, but size unknown until run-time) of Lists (1 <= size <= 2) that I need to combine. Here is an example of what I am looking at:-
ListOfLists = { {1}, {2,3}, {2,3}, {4}, {2,3} }
So, there are 2 stages to what I need to do:-
(1). I need to combine the inner lists in such a way that any combination has exactly ONE item from each list, that is, the possible combinations in the result set here would be:-
1,2,2,4,2
1,2,2,4,3
1,2,3,4,2
1,2,3,4,3
1,3,2,4,2
1,3,2,4,3
1,3,3,4,2
1,3,3,4,3
The Cartesian Product takes care of this, so stage 1 is done.....now, here comes the twist which I can't figure out - at least I can't figure out a LINQ way of doing it (I am still a LINQ noob).
(2). I now need to filter out any duplicate results from this Cartesian Product. A duplicate in this case constitutes any line in the result set with the same quantity of each distinct list element as another line, that is,
1,2,2,4,3 is the "same" as 1,3,2,4,2
because each distinct item within the first list occurs the same number of times in both lists (1 occurs once in each list, 2 appears twice in each list, ....
The final result set should therefore look like this...
1,2,2,4,2
1,2,2,4,3
--
1,2,3,4,3
--
--
--
1,3,3,4,3
Another example is the worst-case scenario (from a combination point of view) where the ListOfLists is {{2,3}, {2,3}, {2,3}, {2,3}, {2,3}}, i.e. a list containing inner lists of the maximum size - in this case there would obviously be 32 results in the Cartesian Product result-set, but the pruned result-set that I am trying to get at would just be:-
2,2,2,2,2
2,2,2,2,3 <-- all other results with four 2's and one 3 (in any order) are suppressed
2,2,2,3,3 <-- all other results with three 2's and two 3's are suppressed, etc
2,2,3,3,3
2,3,3,3,3
3,3,3,3,3
To any mathematically-minded folks out there - I hope you can help. I have actually got a working solution to part 2, but it is a total hack and is computationally-intensive, and I am looking for guidance in finding a more elegant, and efficient LINQ solution to the issue of pruning.
Thanks for reading.
pip
Some resources used so far (to get the Cartesian Product)
computing-a-cartesian-product-with-linq
c-permutation-of-an-array-of-arraylists
msdn
UPDATE - The Solution
Apologies for not posting this sooner...see below
You should implement your own IEqualityComparer<IEnumerable<int>> and then use that in Distinct().
The choice of hash code in the IEqualityComparer depends on your actual data, but I think something like this should be adequate if your actual data resemble those in your examples:
class UnorderedQeuenceComparer : IEqualityComparer<IEnumerable<int>>
{
public bool Equals(IEnumerable<int> x, IEnumerable<int> y)
{
return x.OrderBy(i => i).SequenceEqual(y.OrderBy(i => i));
}
public int GetHashCode(IEnumerable<int> obj)
{
return obj.Sum(i => i * i);
}
}
The important part is that GetHashCode() should be O(N), sorting would be too slow.
void Main()
{
var query = from a in new int[] { 1 }
from b in new int[] { 2, 3 }
from c in new int[] { 2, 3 }
from d in new int[] { 4 }
from e in new int[] { 2, 3 }
select new int[] { a, b, c, d, e };
query.Distinct(new ArrayComparer());
//.Dump();
}
public class ArrayComparer : IEqualityComparer<int[]>
{
public bool Equals(int[] x, int[] y)
{
if (x == null || y == null)
return false;
return x.OrderBy(i => i).SequenceEqual<int>(y.OrderBy(i => i));
}
public int GetHashCode(int[] obj)
{
if ( obj == null || obj.Length == 0)
return 0;
var hashcode = obj[0];
for (int i = 1; i < obj.Length; i++)
{
hashcode ^= obj[i];
}
return hashcode;
}
}
The finalised solution to the whole combining of multisets, then pruning the result-sets to remove duplicates problem ended up in a helper class as a static method. It takes svick's much appreciated answer and injects the IEqualityComparer dependency into the existing CartesianProduct answer I found at Eric Lipperts's blog here (I'd recommend reading his post as it explains the iterations in his thinking and why the linq implimentation is the best).
static IEnumerable<IEnumerable<T>> CartesianProduct<T>(IEnumerable<IEnumerable<T>> sequences,
IEqualityComparer<IEnumerable<T>> sequenceComparer)
{
IEnumerable<IEnumerable<T>> emptyProduct = new[] { Enumerable.Empty<T>() };
var resultsSet = sequences.Aggregate(emptyProduct, (accumulator, sequence) => from accseq in accumulator
from item in sequence
select accseq.Concat(new[] { item }));
if (sequenceComparer != null)
return resultsSet.Distinct(sequenceComparer);
else
return resultsSet;
}

Find out which combinations of numbers in a set add up to a given total

I've been tasked with helping some accountants solve a common problem they have - given a list of transactions and a total deposit, which transactions are part of the deposit? For example, say I have this list of numbers:
1.00
2.50
3.75
8.00
And I know that my total deposit is 10.50, I can easily see that it's made up of the 8.00 and 2.50 transaction. However, given a hundred transactions and a deposit in the millions, it quickly becomes much more difficult.
In testing a brute force solution (which takes way too long to be practical), I had two questions:
With a list of about 60 numbers, it seems to find a dozen or more combinations for any total that's reasonable. I was expecting a single combination to satisfy my total, or maybe a few possibilities, but there always seem to be a ton of combinations. Is there a math principle that describes why this is? It seems that given a collection of random numbers of even a medium size, you can find a multiple combination that adds up to just about any total you want.
I built a brute force solution for the problem, but it's clearly O(n!), and quickly grows out of control. Aside from the obvious shortcuts (exclude numbers larger than the total themselves), is there a way to shorten the time to calculate this?
Details on my current (super-slow) solution:
The list of detail amounts is sorted largest to smallest, and then the following process runs recursively:
Take the next item in the list and see if adding it to your running total makes your total match the target. If it does, set aside the current chain as a match. If it falls short of your target, add it to your running total, remove it from the list of detail amounts, and then call this process again
This way it excludes the larger numbers quickly, cutting the list down to only the numbers it needs to consider. However, it's still n! and larger lists never seem to finish, so I'm interested in any shortcuts I might be able to take to speed this up - I suspect that even cutting 1 number out of the list would cut the calculation time in half.
Thanks for your help!
This special case of the Knapsack problem is called Subset Sum.
C# version
setup test:
using System;
using System.Collections.Generic;
public class Program
{
public static void Main(string[] args)
{
// subtotal list
List<double> totals = new List<double>(new double[] { 1, -1, 18, 23, 3.50, 8, 70, 99.50, 87, 22, 4, 4, 100.50, 120, 27, 101.50, 100.50 });
// get matches
List<double[]> results = Knapsack.MatchTotal(100.50, totals);
// print results
foreach (var result in results)
{
Console.WriteLine(string.Join(",", result));
}
Console.WriteLine("Done.");
Console.ReadKey();
}
}
code:
using System.Collections.Generic;
using System.Linq;
public class Knapsack
{
internal static List<double[]> MatchTotal(double theTotal, List<double> subTotals)
{
List<double[]> results = new List<double[]>();
while (subTotals.Contains(theTotal))
{
results.Add(new double[1] { theTotal });
subTotals.Remove(theTotal);
}
// if no subtotals were passed
// or all matched the Total
// return
if (subTotals.Count == 0)
return results;
subTotals.Sort();
double mostNegativeNumber = subTotals[0];
if (mostNegativeNumber > 0)
mostNegativeNumber = 0;
// if there aren't any negative values
// we can remove any values bigger than the total
if (mostNegativeNumber == 0)
subTotals.RemoveAll(d => d > theTotal);
// if there aren't any negative values
// and sum is less than the total no need to look further
if (mostNegativeNumber == 0 && subTotals.Sum() < theTotal)
return results;
// get the combinations for the remaining subTotals
// skip 1 since we already removed subTotals that match
for (int choose = 2; choose <= subTotals.Count; choose++)
{
// get combinations for each length
IEnumerable<IEnumerable<double>> combos = Combination.Combinations(subTotals.AsEnumerable(), choose);
// add combinations where the sum mathces the total to the result list
results.AddRange(from combo in combos
where combo.Sum() == theTotal
select combo.ToArray());
}
return results;
}
}
public static class Combination
{
public static IEnumerable<IEnumerable<T>> Combinations<T>(this IEnumerable<T> elements, int choose)
{
return choose == 0 ? // if choose = 0
new[] { new T[0] } : // return empty Type array
elements.SelectMany((element, i) => // else recursively iterate over array to create combinations
elements.Skip(i + 1).Combinations(choose - 1).Select(combo => (new[] { element }).Concat(combo)));
}
}
results:
100.5
100.5
-1,101.5
1,99.5
3.5,27,70
3.5,4,23,70
3.5,4,23,70
-1,1,3.5,27,70
1,3.5,4,22,70
1,3.5,4,22,70
1,3.5,8,18,70
-1,1,3.5,4,23,70
-1,1,3.5,4,23,70
1,3.5,4,4,18,70
-1,3.5,8,18,22,23,27
-1,3.5,4,4,18,22,23,27
Done.
If subTotals are repeated, there will appear to be duplicate results (the desired effect). In reality, you will probably want to use the subTotal Tupled with some ID, so you can relate it back to your data.
If I understand your problem correctly, you have a set of transactions, and you merely wish to know which of them could have been included in a given total. So if there are 4 possible transactions, then there are 2^4 = 16 possible sets to inspect. This problem is, for 100 possible transactions, the search space has 2^100 = 1267650600228229401496703205376 possible combinations to search over. For 1000 potential transactions in the mix, it grows to a total of
10715086071862673209484250490600018105614048117055336074437503883703510511249361224931983788156958581275946729175531468251871452856923140435984577574698574803934567774824230985421074605062371141877954182153046474983581941267398767559165543946077062914571196477686542167660429831652624386837205668069376
sets that you must test. Brute force will hardly be a viable solution on these problems.
Instead, use a solver that can handle knapsack problems. But even then, I'm not sure that you can generate a complete enumeration of all possible solutions without some variation of brute force.
There is a cheap Excel Add-in that solves this problem: SumMatch
The Excel Solver Addin as posted over on superuser.com has a great solution (if you have Excel) https://superuser.com/questions/204925/excel-find-a-subset-of-numbers-that-add-to-a-given-total
Its kind of like 0-1 Knapsack problem which is NP-complete and can be solved through dynamic programming in polynomial time.
http://en.wikipedia.org/wiki/Knapsack_problem
But at the end of the algorithm you also need to check that the sum is what you wanted.
Depending on your data you could first look at the cents portion of each transaction. Like in your initial example you know that 2.50 has to be part of the total because it is the only set of non-zero cent transactions which add to 50.
Not a super efficient solution but heres an implementation in coffeescript
combinations returns all possible combinations of the elements in list
combinations = (list) ->
permuations = Math.pow(2, list.length) - 1
out = []
combinations = []
while permuations
out = []
for i in [0..list.length]
y = ( 1 << i )
if( y & permuations and (y isnt permuations))
out.push(list[i])
if out.length <= list.length and out.length > 0
combinations.push(out)
permuations--
return combinations
and then find_components makes use of it to determine which numbers add up to total
find_components = (total, list) ->
# given a list that is assumed to have only unique elements
list_combinations = combinations(list)
for combination in list_combinations
sum = 0
for number in combination
sum += number
if sum is total
return combination
return []
Heres an example
list = [7.2, 3.3, 4.5, 6.0, 2, 4.1]
total = 7.2 + 2 + 4.1
console.log(find_components(total, list))
which returns [ 7.2, 2, 4.1 ]
#include <stdio.h>
#include <stdlib.h>
/* Takes at least 3 numbers as arguments.
* First number is desired sum.
* Find the subset of the rest that comes closest
* to the desired sum without going over.
*/
static long *elements;
static int nelements;
/* A linked list of some elements, not necessarily all */
/* The list represents the optimal subset for elements in the range [index..nelements-1] */
struct status {
long sum; /* sum of all the elements in the list */
struct status *next; /* points to next element in the list */
int index; /* index into elements array of this element */
};
/* find the subset of elements[startingat .. nelements-1] whose sum is closest to but does not exceed desiredsum */
struct status *reportoptimalsubset(long desiredsum, int startingat) {
struct status *sumcdr = NULL;
struct status *sumlist = NULL;
/* sum of zero elements or summing to zero */
if (startingat == nelements || desiredsum == 0) {
return NULL;
}
/* optimal sum using the current element */
/* if current elements[startingat] too big, it won't fit, don't try it */
if (elements[startingat] <= desiredsum) {
sumlist = malloc(sizeof(struct status));
sumlist->index = startingat;
sumlist->next = reportoptimalsubset(desiredsum - elements[startingat], startingat + 1);
sumlist->sum = elements[startingat] + (sumlist->next ? sumlist->next->sum : 0);
if (sumlist->sum == desiredsum)
return sumlist;
}
/* optimal sum not using current element */
sumcdr = reportoptimalsubset(desiredsum, startingat + 1);
if (!sumcdr) return sumlist;
if (!sumlist) return sumcdr;
return (sumcdr->sum < sumlist->sum) ? sumlist : sumcdr;
}
int main(int argc, char **argv) {
struct status *result = NULL;
long desiredsum = strtol(argv[1], NULL, 10);
nelements = argc - 2;
elements = malloc(sizeof(long) * nelements);
for (int i = 0; i < nelements; i++) {
elements[i] = strtol(argv[i + 2], NULL , 10);
}
result = reportoptimalsubset(desiredsum, 0);
if (result)
printf("optimal subset = %ld\n", result->sum);
while (result) {
printf("%ld + ", elements[result->index]);
result = result->next;
}
printf("\n");
}
Best to avoid use of floats and doubles when doing arithmetic and equality comparisons btw.

Resources