Roulette wheel selection algorithm [duplicate] - algorithm

This question already has answers here:
Roulette Selection in Genetic Algorithms
(14 answers)
Closed 7 years ago.
Can anyone provide some pseudo code for a roulette selection function? How would I implement this: I don't really understand how to read this math notation.I want General algorithm to this.

The other answers seem to be assuming that you are trying to implement a roulette game. I think that you are asking about roulette wheel selection in evolutionary algorithms.
Here is some Java code that implements roulette wheel selection.
Assume you have 10 items to choose from and you choose by generating a random number between 0 and 1. You divide the range 0 to 1 up into ten non-overlapping segments, each proportional to the fitness of one of the ten items. For example, this might look like this:
0 - 0.3 is item 1
0.3 - 0.4 is item 2
0.4 - 0.5 is item 3
0.5 - 0.57 is item 4
0.57 - 0.63 is item 5
0.63 - 0.68 is item 6
0.68 - 0.8 is item 7
0.8 - 0.85 is item 8
0.85 - 0.98 is item 9
0.98 - 1 is item 10
This is your roulette wheel. Your random number between 0 and 1 is your spin. If the random number is 0.46, then the chosen item is item 3. If it's 0.92, then it's item 9.

Here is a bit of python code:
def roulette_select(population, fitnesses, num):
""" Roulette selection, implemented according to:
<http://stackoverflow.com/questions/177271/roulette
-selection-in-genetic-algorithms/177278#177278>
"""
total_fitness = float(sum(fitnesses))
rel_fitness = [f/total_fitness for f in fitnesses]
# Generate probability intervals for each individual
probs = [sum(rel_fitness[:i+1]) for i in range(len(rel_fitness))]
# Draw new population
new_population = []
for n in xrange(num):
r = rand()
for (i, individual) in enumerate(population):
if r <= probs[i]:
new_population.append(individual)
break
return new_population

First, generate an array of the percentages you assigned, let's say p[1..n]
and assume the total is the sum of all the percentages.
Then get a random number between 1 to total, let's say r
Now, the algorithm in lua:
local c = 0
for i = 1,n do
c = c + p[i]
if r <= c then
return i
end
end

There are 2 steps to this: First create an array with all the values on the wheel. This can be a 2 dimensional array with colour as well as number, or you can choose to add 100 to red numbers.
Then simply generate a random number between 0 or 1 (depending on whether your language starts numbering array indexes from 0 or 1) and the last element in your array.
Most languages have built-in random number functions. In VB and VBScript the function is RND(). In Javascript it is Math.random()
Fetch the value from that position in the array and you have your random roulette number.
Final note: don't forget to seed your random number generator or you will get the same sequence of draws every time you run the program.

Here is a really quick way to do it using stream selection in Java. It selects the indices of an array using the values as weights. No cumulative weights needed due to the mathematical properties.
static int selectRandomWeighted(double[] wts, Random rnd) {
int selected = 0;
double total = wts[0];
for( int i = 1; i < wts.length; i++ ) {
total += wts[i];
if( rnd.nextDouble() <= (wts[i] / total)) selected = i;
}
return selected;
}
This could be further improved using Kahan summation or reading through the doubles as an iterable if the array was too big to initialize at once.

I wanted the same and so created this self-contained Roulette class. You give it a series of weights (in the form of a double array), and it will simply return an index from that array according to a weighted random pick.
I created a class because you can get a big speed up by only doing the cumulative additions once via the constructor. It's C# code, but enjoy the C like speed and simplicity!
class Roulette
{
double[] c;
double total;
Random random;
public Roulette(double[] n) {
random = new Random();
total = 0;
c = new double[n.Length+1];
c[0] = 0;
// Create cumulative values for later:
for (int i = 0; i < n.Length; i++) {
c[i+1] = c[i] + n[i];
total += n[i];
}
}
public int spin() {
double r = random.NextDouble() * total; // Create a random number between 0 and 1 and times by the total we calculated earlier.
//int j; for (j = 0; j < c.Length; j++) if (c[j] > r) break; return j-1; // Don't use this - it's slower than the binary search below.
//// Binary search for efficiency. Objective is to find index of the number just above r:
int a = 0;
int b = c.Length - 1;
while (b - a > 1) {
int mid = (a + b) / 2;
if (c[mid] > r) b = mid;
else a = mid;
}
return a;
}
}
The initial weights are up to you. Maybe it could be the fitness of each member, or a value inversely proportional to the member's position in the "top 50". E.g.: 1st place = 1.0 weighting, 2nd place = 0.5, 3rd place = 0.333, 4th place = 0.25 weighting etc. etc.

Well, for an American Roulette wheel, you're going to need to generate a random integer between 1 and 38. There are 36 numbers, a 0, and a 00.
One of the big things to consider, though, is that in American roulette, their are many different bets that can be made. A single bet can cover 1, 2, 3, 4, 5, 6, two different 12s, or 18. You may wish to create a list of lists where each number has additional flages to simplify that, or do it all in the programming.
If I were implementing it in Python, I would just create a Tuple of 0, 00, and 1 through 36 and use random.choice() for each spin.

This assumes some class "Classifier" which just has a String condition, String message, and double strength. Just follow the logic.
-- Paul
public static List<Classifier> rouletteSelection(int classifiers) {
List<Classifier> classifierList = new LinkedList<Classifier>();
double strengthSum = 0.0;
double probabilitySum = 0.0;
// add up the strengths of the map
Set<String> keySet = ClassifierMap.CLASSIFIER_MAP.keySet();
for (String key : keySet) {
/* used for debug to make sure wheel is working.
if (strengthSum == 0.0) {
ClassifierMap.CLASSIFIER_MAP.get(key).setStrength(8000.0);
}
*/
Classifier classifier = ClassifierMap.CLASSIFIER_MAP.get(key);
double strength = classifier.getStrength();
strengthSum = strengthSum + strength;
}
System.out.println("strengthSum: " + strengthSum);
// compute the total probability. this will be 1.00 or close to it.
for (String key : keySet) {
Classifier classifier = ClassifierMap.CLASSIFIER_MAP.get(key);
double probability = (classifier.getStrength() / strengthSum);
probabilitySum = probabilitySum + probability;
}
System.out.println("probabilitySum: " + probabilitySum);
while (classifierList.size() < classifiers) {
boolean winnerFound = false;
double rouletteRandom = random.nextDouble();
double rouletteSum = 0.0;
for (String key : keySet) {
Classifier classifier = ClassifierMap.CLASSIFIER_MAP.get(key);
double probability = (classifier.getStrength() / strengthSum);
rouletteSum = rouletteSum + probability;
if (rouletteSum > rouletteRandom && (winnerFound == false)) {
System.out.println("Winner found: " + probability);
classifierList.add(classifier);
winnerFound = true;
}
}
}
return classifierList;
}

You can use a data structure like this:
Map<A, B> roulette_wheel_schema = new LinkedHashMap<A, B>()
where A is an integer that represents a pocket of the roulette wheel, and B is an index that identifies a chromosome in the population. The number of pockets is proportional to the fitness proportionate of each chromosome:
number of pockets = (fitness proportionate) · (scale factor)
Then we generate a random between 0 and the size of the selection schema and with this random number we get the index of the chromosome from the roulette.
We calculate the relative error between the fitness proportionate of each chromosome and the probability of being selected by the selection scheme.
The method getRouletteWheel returns the selection scheme based on previous data structure.
private Map<Integer, Integer> getRouletteWheel(
ArrayList<Chromosome_fitnessProportionate> chromosomes,
int precision) {
/*
* The number of pockets on the wheel
*
* number of pockets in roulette_wheel_schema = probability ·
* (10^precision)
*/
Map<Integer, Integer> roulette_wheel_schema = new LinkedHashMap<Integer, Integer>();
double fitness_proportionate = 0.0D;
double pockets = 0.0D;
int key_counter = -1;
double scale_factor = Math
.pow(new Double(10.0D), new Double(precision));
for (int index_cromosome = 0; index_cromosome < chromosomes.size(); index_cromosome++){
Chromosome_fitnessProportionate chromosome = chromosomes
.get(index_cromosome);
fitness_proportionate = chromosome.getFitness_proportionate();
fitness_proportionate *= scale_factor;
pockets = Math.rint(fitness_proportionate);
System.out.println("... " + index_cromosome + " : " + pockets);
for (int j = 0; j < pockets; j++) {
roulette_wheel_schema.put(Integer.valueOf(++key_counter),
Integer.valueOf(index_cromosome));
}
}
return roulette_wheel_schema;
}

I have worked out a Java code similar to that of Dan Dyer (referenced earlier). My roulette-wheel, however, selects a single element based on a probability vector (input) and returns the index of the selected element.
Having said that, the following code is more appropriate if the selection size is unitary and if you do not assume how the probabilities are calculated and zero probability value is allowed. The code is self-contained and includes a test with 20 wheel spins (to run).
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
import java.util.Random;
import java.util.logging.Level;
import java.util.logging.Logger;
/**
* Roulette-wheel Test version.
* Features a probability vector input with possibly null probability values.
* Appropriate for adaptive operator selection such as Probability Matching
* or Adaptive Pursuit, (Dynamic) Multi-armed Bandit.
* #version October 2015.
* #author Hakim Mitiche
*/
public class RouletteWheel {
/**
* Selects an element probabilistically.
* #param wheelProbabilities elements probability vector.
* #param rng random generator object
* #return selected element index
* #throws java.lang.Exception
*/
public int select(List<Double> wheelProbabilities, Random rng)
throws Exception{
double[] cumulativeProba = new double[wheelProbabilities.size()];
cumulativeProba[0] = wheelProbabilities.get(0);
for (int i = 1; i < wheelProbabilities.size(); i++)
{
double proba = wheelProbabilities.get(i);
cumulativeProba[i] = cumulativeProba[i - 1] + proba;
}
int last = wheelProbabilities.size()-1;
if (cumulativeProba[last] != 1.0)
{
throw new Exception("The probabilities does not sum up to one ("
+ "sum="+cumulativeProba[last]);
}
double r = rng.nextDouble();
int selected = Arrays.binarySearch(cumulativeProba, r);
if (selected < 0)
{
/* Convert negative insertion point to array index.
to find the correct cumulative proba range index.
*/
selected = Math.abs(selected + 1);
}
/* skip indexes of elements with Zero probability,
go backward to matching index*/
int i = selected;
while (wheelProbabilities.get(i) == 0.0){
System.out.print(i+" selected, correction");
i--;
if (i<0) i=last;
}
selected = i;
return selected;
}
public static void main(String[] args){
RouletteWheel rw = new RouletteWheel();
int rept = 20;
List<Double> P = new ArrayList<>(4);
P.add(0.2);
P.add(0.1);
P.add(0.6);
P.add(0.1);
Random rng = new Random();
for (int i = 0 ; i < rept; i++){
try {
int s = rw.select(P, rng);
System.out.println("Element selected "+s+ ", P(s)="+P.get(s));
} catch (Exception ex) {
Logger.getLogger(RouletteWheel.class.getName()).log(Level.SEVERE, null, ex);
}
}
P.clear();
P.add(0.2);
P.add(0.0);
P.add(0.5);
P.add(0.0);
P.add(0.1);
P.add(0.2);
//rng = new Random();
for (int i = 0 ; i < rept; i++){
try {
int s = rw.select(P, rng);
System.out.println("Element selected "+s+ ", P(s)="+P.get(s));
} catch (Exception ex) {
Logger.getLogger(RouletteWheel.class.getName()).log(Level.SEVERE, null, ex);
}
}
}
/**
* {#inheritDoc}
* #return
*/
#Override
public String toString()
{
return "Roulette Wheel Selection";
}
}
Below an execution sample for a proba vector P=[0.2,0.1,0.6,0.1],
WheelElements = [0,1,2,3]:
Element selected 3, P(s)=0.1
Element selected 2, P(s)=0.6
Element selected 3, P(s)=0.1
Element selected 2, P(s)=0.6
Element selected 1, P(s)=0.1
Element selected 2, P(s)=0.6
Element selected 3, P(s)=0.1
Element selected 2, P(s)=0.6
Element selected 2, P(s)=0.6
Element selected 2, P(s)=0.6
Element selected 2, P(s)=0.6
Element selected 2, P(s)=0.6
Element selected 3, P(s)=0.1
Element selected 2, P(s)=0.6
Element selected 2, P(s)=0.6
Element selected 2, P(s)=0.6
Element selected 0, P(s)=0.2
Element selected 2, P(s)=0.6
Element selected 2, P(s)=0.6
Element selected 2, P(s)=0.6
The code also tests a roulette wheel with zero probability.

I am afraid that anybody using the in built random number generator in all programming languages must be aware that the number generated is not 100% random.So should be used with caution.

Random Number Generator pseudo code
add one to a sequential counter
get the current value of the sequential counter
add the counter value by the computer tick count or some other small interval timer value
optionally add addition numbers, like a number from an external piece of hardware like a plasma generator or some other type of somewhat random phenomena
divide the result by a very big prime number
359334085968622831041960188598043661065388726959079837 for example
get some digits from the far right of the decimal point of the result
use these digits as a random number
Use the random number digits to create random numbers between 1 and 38 (or 37 European) for roulette.

Related

Random sample of values with the specified resulting probabilities

Imagine we have four symbols - 'a', 'b', 'c', 'd'. We also have four given probabilities of those symbols appearing in the function output - P1, P2, P3, P4 (the sum of which is equal to 1). How would one implement a function which would generate a random sample of three of those symbols, such is that the resulting symbols are present in it with those specified probabilities?
Example: 'a', 'b', 'c' and 'd' have the probabilities of 9/30, 8/30, 7/30 and 6/30 respectively. The function outputs various random samples of any three out of those four symbols: 'abc', 'dca', 'bad' and so on. We run this function many times, counting the amount of times each of the symbols is encountered in its output. At the end, the value of counts stored for 'a' divided by the total amount of symbols output should converge to 9/30, for 'b' to 8/30, for 'c' to 7/30, and for 'd' to 6/30.
E.g. the function generates 10 outputs:
adc
dab
bca
dab
dba
cab
dcb
acd
cab
abc
which out of 30 symbols contains 9 of 'a', 8 of 'b', 7 of 'c' and 6 of 'd'. This is an idealistic example, of course, as the values would only converge when the number of samples is much larger - but it should hopefully convey the idea.
Obviously, this all is only possible when neither probability is larger than 1/3, since each single sample output would always contain three distinct symbols. It is ok for the function to enter an infinite loop or otherwise behave erratically if it's impossible to satisfy the values provided.
Note: the function should obviously use an RNG, but should otherwise be stateless. Each new invocation should be independent from any of the previous ones, except for the RNG state.
EDIT: Even though the description mentions choosing 3 out of 4 values, ideally the algorithm should be able to cope with any sample size.
Your problem is underdetermined.
If we assign a probability to each string of three letters that we allow, p(abc), p(abd), p(acd) etc xtc we can gernerate a series of equations
eqn1: p(abc) + p(abd) + ... others with a "a" ... = p1
...
...
eqn2: p(abd) + p(acd) + ... others with a "d" ... = p4
This has more unknowns than equations, so many ways of solving it. Once a solution is found, by whatever method you choose (use the simplex algorithm if you are me), sample from the probabilities of each string using the roulette method that #alestanis describes.
from numpy import *
# using cvxopt-1.1.5
from cvxopt import matrix, solvers
###########################
# Functions to do some parts
# function to find all valid outputs
def perms(alphabet, length):
if length == 0:
yield ""
return
for i in range(len(alphabet)):
val1 = alphabet[i]
for val2 in perms(alphabet[:i]+alphabet[i+1:], length-1):
yield val1 + val2
# roulette sampler
def roulette_sampler(values, probs):
# Create cumulative prob distro
probs_cum = [sum(probs[:i+1]) for i in range(n_strings)]
def fun():
r = random.rand()
for p,s in zip(probs_cum, values):
if r < p:
return s
# in case of rounding error
return values[-1]
return fun
############################
# Main Part
# create list of all valid strings
alphabet = "abcd"
string_length = 3
alpha_probs = [string_length*x/30. for x in range(9,5,-1)]
# show probs
for a,p in zip(alphabet, alpha_probs):
print "p("+a+") =",p
# all valid outputs for this particular case
strings = [perm for perm in perms(alphabet, string_length)]
n_strings = len(strings)
# constraints from probabilities p(abc) + p(abd) ... = p(a)
contains = array([[1. if s.find(a) >= 0 else 0. for a in alphabet] for s in strings])
#both = concatenate((contains,wons), axis=1).T # hacky, but whatever
#A = matrix(both)
#b = matrix(alpha_probs + [1.])
A = matrix(contains.T)
b = matrix(alpha_probs)
#also need to constrain to [0,1]
wons = array([[1. for s in strings]])
G = matrix(concatenate((eye(n_strings),wons,-eye(n_strings),-wons)))
h = matrix(concatenate((ones(n_strings+1),zeros(n_strings+1))))
## target matricies for approx KL divergence
# uniform prob over valid outputs
u = 1./len(strings)
P = matrix(eye(n_strings))
q = -0.5*u*matrix(ones(n_strings))
# will minimise p^2 - pq for each p val equally
# Do convex optimisation
sol = solvers.qp(P,q,G,h,A,b)
probs = array(sol['x'])
# Print ouput
for s,p in zip(strings,probs):
print "p("+s+") =",p
checkprobs = [0. for char in alphabet]
for a,i in zip(alphabet, range(len(alphabet))):
for s,p in zip(strings,probs):
if s.find(a) > -1:
checkprobs[i] += p
print "p("+a+") =",checkprobs[i]
print "total =",sum(probs)
# Create the sampling function
rndstring = roulette_sampler(strings, probs)
###################
# Verify
print "sampling..."
test_n = 1000
output = [rndstring() for i in xrange(test_n)]
# find which one it is
sampled_freqs = []
for char in alphabet:
n = 0
for val in output:
if val.find(char) > -1:
n += 1
sampled_freqs += [n]
print "plotting histogram..."
import matplotlib.pyplot as plt
plt.bar(range(0,len(alphabet)),array(sampled_freqs)/float(test_n), width=0.5)
plt.show()
EDIT: Python code
I think this is a pretty interesting problem. I don't know the general solution, but it's easy enough to solve in the case of samples of size n-1 (if there is a solution), since there are exactly n possible samples, each of which corresponds to the absence of one of the elements.
Suppose we are seeking Fa = 9/30, Fb = 8/30, Fc = 7/30, Fd = 6/30 in samples of size 3 from a universe of size 4, as in the OP. We can translate each of those frequencies directly into a frequency of samples by selecting the samples which do not contain the given object. For example, we wish 9/30 of the selected objects to be a's; we cannot have more than one a in a sample, and we always have three symbols in a sample; consequently, 9/10 of the samples must contain a and 1/10 cannot contain a. But there is only one possible sample which doesn't contain a: bcd. So 10% of the samples must be bcd. Similarly, 20% must be acd; 30% abd and 40% abc. (Or, more generally, Fā = 1 - (n-1)Fa where Fā is the frequency of the (unique) sample which does not include a)
I can't help thinking that this observation combined with one of the classic ways of generating unique samples can solve the general problem. But I don't have that solution. For what it's worth, the algorithm I'm thinking of is the following:
To select a random sample of size k out of a universe U of n objects:
1) Set needed = k; available = n.
2) For each element in U, select a random number in the range [0, 1).
3) If the random number is less than k/n:
3a) Add the element to the sample.
3b) Decrement needed by 1. If it reaches 0, we're finished.
4) Decrement available, and continue with the next element in U.
So my idea is that it should be possible to manipulate the frequency of element by changing the threshold in step 3, making it somehow a function of the desired frequency of the corresponding element.
Assuming that the length of a word is always one less than the number of symbols, the following C# code does the job:
using System;
using System.Collections.Generic;
using System.Linq;
using MathNet.Numerics.Distributions;
namespace RandomSymbols
{
class Program
{
static void Main(string[] args)
{
// Sample case: Four symbols with the following distribution, and 10000 trials
double[] distribution = { 9.0/30, 8.0/30, 7.0/30, 6.0/30 };
int trials = 10000;
// Create an array containing all of the symbols
char[] symbols = Enumerable.Range('a', distribution.Length).Select(s => (char)s).ToArray();
// We're assuming that the word length is always one less than the number of symbols
int wordLength = symbols.Length - 1;
// Calculate the probability of each symbol being excluded from a given word
double[] excludeDistribution = Array.ConvertAll(distribution, p => 1 - p * wordLength);
// Create a random variable using the MathNet.Numerics library
var randVar = new Categorical(excludeDistribution);
var random = new Random();
randVar.RandomSource = random;
// We'll store all of the words in an array
string[] words = new string[trials];
for (int t = 0; t < trials; t++)
{
// Start with a word containing all of the symbols
var word = new List<char>(symbols);
// Remove one of the symbols
word.RemoveAt(randVar.Sample());
// Randomly permute the remainder
for (int i = 0; i < wordLength; i++)
{
int swapIndex = random.Next(wordLength);
char temp = word[swapIndex];
word[swapIndex] = word[i];
word[i] = temp;
}
// Store the word
words[t] = new string(word.ToArray());
}
// Display words
Array.ForEach(words, w => Console.WriteLine(w));
// Display histogram
Array.ForEach(symbols, s => Console.WriteLine("{0}: {1}", s, words.Count(w => w.Contains(s))));
}
}
}
Update: The following is a C implementation of the method that rici outlined. The tricky part is calculating the thresholds that he mentions, which I did with recursion.
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
// ****** Change the following for different symbol distributions, word lengths, and number of trials ******
double targetFreqs[] = {10.0/43, 9.0/43, 8.0/43, 7.0/43, 6.0/43, 2.0/43, 1.0/43 };
const int WORDLENGTH = 4;
const int TRIALS = 1000000;
// *********************************************************************************************************
const int SYMBOLCOUNT = sizeof(targetFreqs) / sizeof(double);
double inclusionProbs[SYMBOLCOUNT];
double probLeftToIncludeTable[SYMBOLCOUNT][SYMBOLCOUNT];
// Calculates the probability that there will be n symbols left to be included when we get to the ith symbol.
double probLeftToInclude(int i, int n)
{
if (probLeftToIncludeTable[i][n] == -1)
{
// If this is the first symbol, then the number of symbols left to be included is simply the word length.
if (i == 0)
{
probLeftToIncludeTable[i][n] = (n == WORDLENGTH ? 1.0 : 0.0);
}
else
{
// Calculate the probability based on the previous symbol's probabilities.
// To get to a point where there are n symbols left to be included, either there were n+1 symbols left
// when we were considering that previous symbol and we included it, leaving n,
// or there were n symbols left and we didn't included it, also leaving n.
// We have to take into account that the previous symbol may have been manditorily included.
probLeftToIncludeTable[i][n] = probLeftToInclude(i-1, n+1) * (n == SYMBOLCOUNT-i ? 1.0 : inclusionProbs[i-1])
+ probLeftToInclude(i-1, n) * (n == 0 ? 1.0 : 1 - inclusionProbs[i-1]);
}
}
return probLeftToIncludeTable[i][n];
}
// Calculates the probability that the ith symbol won't *have* to be included or *have* to be excluded.
double probInclusionIsOptional(int i)
{
// The probability that inclusion is optional is equal to 1.0
// minus the probability that none of the remaining symbols can be included
// minus the probability that all of the remaining symbols must be included.
return 1.0 - probLeftToInclude(i, 0) - probLeftToInclude(i, SYMBOLCOUNT - i);
}
// Calculates the probability with which the ith symbol should be included, assuming that
// it doesn't *have* to be included or *have* to be excluded.
double inclusionProb(int i)
{
// The following is derived by simple algebra:
// Unconditional probability = (1.0 * probability that it must be included) + (inclusionProb * probability that inclusion is optional)
// therefore...
// inclusionProb = (Unconditional probability - probability that it must be included) / probability that inclusion is optional
return (targetFreqs[i]*WORDLENGTH - probLeftToInclude(i, SYMBOLCOUNT - i)) / probInclusionIsOptional(i);
}
int main(int argc, char* argv[])
{
srand(time(NULL));
// Initialize inclusion probabilities
for (int i=0; i<SYMBOLCOUNT; i++)
for (int j=0; j<SYMBOLCOUNT; j++)
probLeftToIncludeTable[i][j] = -1.0;
// Calculate inclusion probabilities
for (int i=0; i<SYMBOLCOUNT; i++)
{
inclusionProbs[i] = inclusionProb(i);
}
// Histogram
int histogram[SYMBOLCOUNT];
for (int i=0; i<SYMBOLCOUNT; i++)
{
histogram[i] = 0;
}
// Scratchpad for building our words
char word[WORDLENGTH+1];
word[WORDLENGTH] = '\0';
// Run trials
for (int t=0; t<TRIALS; t++)
{
int included = 0;
// Build the word by including or excluding symbols according to the problem constraints
// and the probabilities in inclusionProbs[].
for (int i=0; i<SYMBOLCOUNT && included<WORDLENGTH; i++)
{
if (SYMBOLCOUNT - i == WORDLENGTH - included // if we have to include this symbol
|| (double)rand()/(double)RAND_MAX < inclusionProbs[i]) // or if we get a lucky roll of the dice
{
word[included++] = 'a' + i;
histogram[i]++;
}
}
// Randomly permute the word
for (int i=0; i<WORDLENGTH; i++)
{
int swapee = rand() % WORDLENGTH;
char temp = word[swapee];
word[swapee] = word[i];
word[i] = temp;
}
// Uncomment the following to show each word
// printf("%s\r\n", word);
}
// Show the histogram
for (int i=0; i<SYMBOLCOUNT; i++)
{
printf("%c: target=%d, actual=%d\r\n", 'a'+i, (int)(targetFreqs[i]*WORDLENGTH*TRIALS), histogram[i]);
}
return 0;
}
To do this you have to use a temporary array storing the cumulated sum of your probabilities.
In your example, probabilities are 9/30, 8/30, 7/30 and 6/30 respectively.
You should then have an array:
values = {'a', 'b', 'c', 'd'}
proba = {9/30, 17/30, 24/30, 1}
Then you pick a random number r in [0, 1] and do like this:
char chooseRandom() {
int i = 0;
while (r > proba[i])
++i;
return values[i];
}

DP algorithm for bounded Knapsack?

The Wikipedia article about Knapsack problem contains lists three kinds of it:
1-0 (one item of a type)
Bounded (several items of a type)
Unbounded (unlimited number of items of a type)
The article contains DP approaches for 1. and 3. types of problem, but no solution for 2.
How can the dynamic programming algorithm for solving 2. be described?
Use the 0-1 variant, but allow repetition of an item in the solution up to the number of times specified in its bound. You would need to maintain a vector stating how many copies of each item you already included in the partial solution.
The other DP solutions mentioned are all suboptimal as they require you to directly simulate the problem, resulting in a O(number of items * maximum weight * total count of items) runtime complexity.
There are many ways to optimize this, and I'll mention a few of them here:
One solution is to apply a technique similar to Sqrt Decomposition and is described here: https://codeforces.com/blog/entry/59606. This algorithm runs in O(number of items * maximum weight * sqrt(maximum weight)).
However, Dorijan Lendvaj describes a much faster algorithm that runs in O(number of items * maximum weight * log(maximum weight)) here: https://codeforces.com/blog/entry/65202?#comment-492168
Another way to think of the above approach is the following:
For each type of item, let's define the following values:
w, the weight/cost of the current type of item
v, the value of the current type of item
n, the number of copies of the current type of item available to use
Phase 1
First, let us consider 2^k, the largest power of 2 less than or equal to n. We insert the following items (each inserted item is in the format (weight, value)): (w, v), (2 * w, 2 * v), (2^2 * w, 2^2 * v), ..., (2^(k-1) * w, 2^(k-1) * v). Note that the items inserted each represent 2^0, 2^1, ..., 2^(k-1) copies of the current type of item respectively.
Observe that this is the same as inserting 2^k - 1 copies of the current type of item. This is because we can simulate the taking of any number of items (represented as n') by taking the combination of the above items that corresponds to the binary representation of n' (For all whole numbers k', if the bit representing 2^k' is set, take the item that represents 2^k' copies of the current type of item).
Phase 2
Lastly, we just insert the items that correspond to the set bits of n - (2^k - 1). (For all whole numbers k', if the bit representing 2^k' is set, insert (2^k' * w, 2^k' * v)).
Now, we can simulate the taking of up to n items of the current type simply by taking a combination of the above inserted items.
I don't currently have an exact proof of this solution, but after playing around with it for a while it seems correct. If I can figure one out I may update this post later on.
Proof
First, a proposition: All we have to prove is that inserting the above items allows us to simulate the taking of any number of items of the current type up to n.
With that in mind, let's define some variables:
Let n be the number of items of the current type available
Let x be the number of items of the current type we want to take
Let k be the greatest integer such that 2^k <= n
If x < 2^k, we can easily take x items using the method described in phase 1 of the algorithm:
... we can simulate the taking of any number of items (represented as n') by taking the combination of the above items that corresponds to the binary representation of n' (For all whole numbers k', if the bit representing 2^k' is set, take the item that represents 2^k' copies of the current type of item).
Otherwise, we do the following:
Take n - (2^k - 1) items. This is done by taking all the items inserted in phase 2. Now only the items inserted in phase 1 are available for use.
Take x - (n - (2^k - 1)) items. Since this value is always less than 2^k, we can just use the method used for the first case.
Finally, how do we know that x - (n - (2^k - 1)) < 2^k?
If we simplify the left side, we get:
x - (n - (2^k - 1))
x - n + 2^k - 1
x - (n + 1) + 2^k
If the above value was >= 2^k, then x - (n + 1) >= 0 would be true, meaning that x > n. That would be impossible as that's not a valid value of x.
Finally, there is even an approach mentioned here that runs in O(number of items * maximum weight) time.
The algorithm is similar to the brute force method ic3b3rg proposed and just uses simple DP optimizations and sliding window deque to bring down the run time.
My code was tested on this problem (classical bounded knapsack problem): https://dmoj.ca/problem/knapsack
My code: https://pastebin.com/acezMrMY
I posted an article on Code Project which discusses a more efficient solution to the bounded knapsack algorithm.
From the article:
In the dynamic programming solution, each position of the m array is a
sub-problem of capacity j. In the 0/1 algorithm, for each sub-problem
we consider the value of adding one copy of each item to the knapsack.
In the following algorithm, for each sub-problem we consider the value
of adding the lesser of the quantity that will fit, or the quantity
available of each item.
I've also enhanced the code so that we can determine what's in the
optimized knapsack (as opposed to just the optimized value).
ItemCollection[] ic = new ItemCollection[capacity + 1];
for(int i=0;i<=capacity;i++) ic[i] = new ItemCollection();
for(int i=0;i<items.Count;i++)
for(int j=capacity;j>=0;j--)
if(j >= items[i].Weight) {
int quantity = Math.Min(items[i].Quantity, j / items[i].Weight);
for(int k=1;k<=quantity;k++) {
ItemCollection lighterCollection = ic[j - k * items[i].Weight];
int testValue = lighterCollection.TotalValue + k * items[i].Value;
if(testValue > ic[j].TotalValue) (ic[j] = lighterCollection.Copy()).AddItem(items[i],k);
}
}
private class Item {
public string Description;
public int Weight;
public int Value;
public int Quantity;
public Item(string description, int weight, int value, int quantity) {
Description = description;
Weight = weight;
Value = value;
Quantity = quantity;
}
}
private class ItemCollection {
public Dictionary<string,int> Contents = new Dictionary<string,int>();
public int TotalValue;
public int TotalWeight;
public void AddItem(Item item,int quantity) {
if(Contents.ContainsKey(item.Description)) Contents[item.Description] += quantity;
else Contents[item.Description] = quantity;
TotalValue += quantity * item.Value;
TotalWeight += quantity * item.Weight;
}
public ItemCollection Copy() {
var ic = new ItemCollection();
ic.Contents = new Dictionary<string,int>(this.Contents);
ic.TotalValue = this.TotalValue;
ic.TotalWeight = this.TotalWeight;
return ic;
}
}
The download in the Code Project article includes a test case.
First, store all your data in a single array (with repetition).
Then use the 1st method mentioned in the Wikipedia article(1-0).
For example, trying a bounded knapsack with { 2 (2 times), 4(3 times),...} is equivalent to solving a 1-0 knapsack with {2, 2, 4, 4, 4,...}.
I will suggest you to use Knapsack Fraction Greedy Method Algorithm. It's Complexity is O(n log n) and one of the best algorithm.
Below I have mentioned its code in c#..
private static void Knapsack()
{
Console.WriteLine("************Kanpsack***************");
Console.WriteLine("Enter no of items");
int _noOfItems = Convert.ToInt32(Console.ReadLine());
int[] itemArray = new int[_noOfItems];
int[] weightArray = new int[_noOfItems];
int[] priceArray = new int[_noOfItems];
int[] fractionArray=new int[_noOfItems];
for(int i=0;i<_noOfItems;i++)
{
Console.WriteLine("[Item"+" "+(i+1)+"]");
Console.WriteLine("");
Console.WriteLine("Enter the Weight");
weightArray[i] = Convert.ToInt32(Console.ReadLine());
Console.WriteLine("Enter the Price");
priceArray[i] = Convert.ToInt32(Console.ReadLine());
Console.WriteLine("");
itemArray[i] = i+1 ;
}//for loop
int temp;
Console.WriteLine(" ");
Console.WriteLine("ITEM" + " " + "WEIGHT" + " "+"PRICE");
Console.WriteLine(" ");
for(int i=0;i<_noOfItems;i++)
{
Console.WriteLine("Item"+" "+(i+1)+" "+weightArray[i]+" "+priceArray[i]);
Console.WriteLine(" ");
}//For Loop For Printing the value.......
//Caluclating Fraction for the Item............
for(int i=0;i<_noOfItems;i++)
{
fractionArray[i] = (priceArray[i] / weightArray[i]);
}
Console.WriteLine("Testing.............");
//sorting the Item on the basis of fraction value..........
//Bubble Sort To Sort the Process Priority
for (int i = 0; i < _noOfItems; i++)
{
for (int j = i + 1; j < _noOfItems; j++)
{
if (fractionArray[j] > fractionArray[i])
{
//item Array
temp = itemArray[j];
itemArray[j] = itemArray[i];
itemArray[i] = temp;
//Weight Array
temp = weightArray[j];
weightArray[j] = weightArray[i];
weightArray[i] = temp;
//Price Array
temp = priceArray[j];
priceArray[j] = priceArray[i];
priceArray[i] = temp;
//Fraction Array
temp = fractionArray[j];
fractionArray[j] = fractionArray[i];
fractionArray[i] = temp;
}//if
}//Inner for
}//outer For
// Printing its value..............After Sorting..............
Console.WriteLine(" ");
Console.WriteLine("ITEM" + " " + "WEIGHT" + " " + "PRICE" + " "+"Fraction");
Console.WriteLine(" ");
for (int i = 0; i < _noOfItems; i++)
{
Console.WriteLine("Item" + " " + (itemArray[i]) + " " + weightArray[i] + " " + priceArray[i] + " "+fractionArray[i]);
Console.WriteLine(" ");
}//For Loop For Printing the value.......
Console.WriteLine("");
Console.WriteLine("Enter the Capacity of Knapsack");
int _capacityKnapsack = Convert.ToInt32(Console.ReadLine());
// Creating the valuse for Solution
int k=0;
int fractionvalue = 0;
int[] _takingItemArray=new int[100];
int sum = 0,_totalPrice=0;
int l = 0;
int _capacity = _capacityKnapsack;
do
{
if(k>=_noOfItems)
{
k = 0;
}
if (_capacityKnapsack >= weightArray[k])
{
_takingItemArray[l] = weightArray[k];
_capacityKnapsack = _capacityKnapsack - weightArray[k];
_totalPrice += priceArray[k];
k++;
l++;
}
else
{
fractionvalue = fractionArray[k];
_takingItemArray[l] = _capacityKnapsack;
_totalPrice += _capacityKnapsack * fractionArray[k];
k++;
l++;
}
sum += _takingItemArray[l-1];
} while (sum != _capacity);
Console.WriteLine("");
Console.WriteLine("Value in Kg Are............");
Console.WriteLine("");
for (int i = 0; i < _takingItemArray.Length; i++)
{
if(_takingItemArray[i]!=0)
{
Console.WriteLine(_takingItemArray[i]);
Console.WriteLine("");
}
else
{
break;
}
enter code here
}//for loop
Console.WriteLine("Toatl Value is "+_totalPrice);
}//Method
We can use 0/1 knapsack algorithm with tracking # of items left for each item;
We could do the same on unbounded knapsack algorithm to solve bounded knapsack problem also.

Printing numbers of the form 2^i * 5^j in increasing order

How do you print numbers of form 2^i * 5^j in increasing order.
For eg:
1, 2, 4, 5, 8, 10, 16, 20
This is actually a very interesting question, especially if you don't want this to be N^2 or NlogN complexity.
What I would do is the following:
Define a data structure containing 2 values (i and j) and the result of the formula.
Define a collection (e.g. std::vector) containing this data structures
Initialize the collection with the value (0,0) (the result is 1 in this case)
Now in a loop do the following:
Look in the collection and take the instance with the smallest value
Remove it from the collection
Print this out
Create 2 new instances based on the instance you just processed
In the first instance increment i
In the second instance increment j
Add both instances to the collection (if they aren't in the collection yet)
Loop until you had enough of it
The performance can be easily tweaked by choosing the right data structure and collection.
E.g. in C++, you could use an std::map, where the key is the result of the formula, and the value is the pair (i,j). Taking the smallest value is then just taking the first instance in the map (*map.begin()).
I quickly wrote the following application to illustrate it (it works!, but contains no further comments, sorry):
#include <math.h>
#include <map>
#include <iostream>
typedef __int64 Integer;
typedef std::pair<Integer,Integer> MyPair;
typedef std::map<Integer,MyPair> MyMap;
Integer result(const MyPair &myPair)
{
return pow((double)2,(double)myPair.first) * pow((double)5,(double)myPair.second);
}
int main()
{
MyMap myMap;
MyPair firstValue(0,0);
myMap[result(firstValue)] = firstValue;
while (true)
{
auto it=myMap.begin();
if (it->first < 0) break; // overflow
MyPair myPair = it->second;
std::cout << it->first << "= 2^" << myPair.first << "*5^" << myPair.second << std::endl;
myMap.erase(it);
MyPair pair1 = myPair;
++pair1.first;
myMap[result(pair1)] = pair1;
MyPair pair2 = myPair;
++pair2.second;
myMap[result(pair2)] = pair2;
}
}
This is well suited to a functional programming style. In F#:
let min (a,b)= if(a<b)then a else b;;
type stream (current, next)=
member this.current = current
member this.next():stream = next();;
let rec merge(a:stream,b:stream)=
if(a.current<b.current) then new stream(a.current, fun()->merge(a.next(),b))
else new stream(b.current, fun()->merge(a,b.next()));;
let rec Squares(start) = new stream(start,fun()->Squares(start*2));;
let rec AllPowers(start) = new stream(start,fun()->merge(Squares(start*2),AllPowers(start*5)));;
let Results = AllPowers(1);;
Works well with Results then being a stream type with current value and a next method.
Walking through it:
I define min for completenes.
I define a stream type to have a current value and a method to return a new string, essentially head and tail of a stream of numbers.
I define the function merge, which takes the smaller of the current values of two streams and then increments that stream. It then recurses to provide the rest of the stream. Essentially, given two streams which are in order, it will produce a new stream which is in order.
I define squares to be a stream increasing in powers of 2.
AllPowers takes the start value and merges the stream resulting from all squares at this number of powers of 5. it with the stream resulting from multiplying it by 5, since these are your only two options. You effectively are left with a tree of results
The result is merging more and more streams, so you merge the following streams
1, 2, 4, 8, 16, 32...
5, 10, 20, 40, 80, 160...
25, 50, 100, 200, 400...
.
.
.
Merging all of these turns out to be fairly efficient with tail recursio and compiler optimisations etc.
These could be printed to the console like this:
let rec PrintAll(s:stream)=
if (s.current > 0) then
do System.Console.WriteLine(s.current)
PrintAll(s.next());;
PrintAll(Results);
let v = System.Console.ReadLine();
Similar things could be done in any language which allows for recursion and passing functions as values (it's only a little more complex if you can't pass functions as variables).
For an O(N) solution, you can use a list of numbers found so far and two indexes: one representing the next number to be multiplied by 2, and the other the next number to be multiplied by 5. Then in each iteration you have two candidate values to choose the smaller one from.
In Python:
numbers = [1]
next_2 = 0
next_5 = 0
for i in xrange(100):
mult_2 = numbers[next_2]*2
mult_5 = numbers[next_5]*5
if mult_2 < mult_5:
next = mult_2
next_2 += 1
else:
next = mult_5
next_5 += 1
# The comparison here is to avoid appending duplicates
if next > numbers[-1]:
numbers.append(next)
print numbers
So we have two loops, one incrementing i and second one incrementing j starting both from zero, right? (multiply symbol is confusing in the title of the question)
You can do something very straightforward:
Add all items in an array
Sort the array
Or you need an other solution with more math analysys?
EDIT: More smart solution by leveraging similarity with Merge Sort problem
If we imagine infinite set of numbers of 2^i and 5^j as two independent streams/lists this problem looks very the same as well known Merge Sort problem.
So solution steps are:
Get two numbers one from the each of streams (of 2 and of 5)
Compare
Return smallest
get next number from the stream of the previously returned smallest
and that's it! ;)
PS: Complexity of Merge Sort always is O(n*log(n))
I visualize this problem as a matrix M where M(i,j) = 2^i * 5^j. This means that both the rows and columns are increasing.
Think about drawing a line through the entries in increasing order, clearly beginning at entry (1,1). As you visit entries, the row and column increasing conditions ensure that the shape formed by those cells will always be an integer partition (in English notation). Keep track of this partition (mu = (m1, m2, m3, ...) where mi is the number of smaller entries in row i -- hence m1 >= m2 >= ...). Then the only entries that you need to compare are those entries which can be added to the partition.
Here's a crude example. Suppose you've visited all the xs (mu = (5,3,3,1)), then you need only check the #s:
x x x x x #
x x x #
x x x
x #
#
Therefore the number of checks is the number of addable cells (equivalently the number of ways to go up in Bruhat order if you're of a mind to think in terms of posets).
Given a partition mu, it's easy to determine what the addable states are. Image an infinite string of 0s following the last positive entry. Then you can increase mi by 1 if and only if m(i-1) > mi.
Back to the example, for mu = (5,3,3,1) we can increase m1 (6,3,3,1) or m2 (5,4,3,1) or m4 (5,3,3,2) or m5 (5,3,3,1,1).
The solution to the problem then finds the correct sequence of partitions (saturated chain). In pseudocode:
mu = [1,0,0,...,0];
while (/* some terminate condition or go on forever */) {
minNext = 0;
nextCell = [];
// look through all addable cells
for (int i=0; i<mu.length; ++i) {
if (i==0 or mu[i-1]>mu[i]) {
// check for new minimum value
if (minNext == 0 or 2^i * 5^(mu[i]+1) < minNext) {
nextCell = i;
minNext = 2^i * 5^(mu[i]+1)
}
}
}
// print next largest entry and update mu
print(minNext);
mu[i]++;
}
I wrote this in Maple stopping after 12 iterations:
1, 2, 4, 5, 8, 10, 16, 20, 25, 32, 40, 50
and the outputted sequence of cells added and got this:
1 2 3 5 7 10
4 6 8 11
9 12
corresponding to this matrix representation:
1, 2, 4, 8, 16, 32...
5, 10, 20, 40, 80, 160...
25, 50, 100, 200, 400...
First of all, (as others mentioned already) this question is very vague!!!
Nevertheless, I am going to give a shot based on your vague equation and the pattern as your expected result. So I am not sure the following will be true for what you are trying to do, however it may give you some idea about java collections!
import java.util.List;
import java.util.ArrayList;
import java.util.SortedSet;
import java.util.TreeSet;
public class IncreasingNumbers {
private static List<Integer> findIncreasingNumbers(int maxIteration) {
SortedSet<Integer> numbers = new TreeSet<Integer>();
SortedSet<Integer> numbers2 = new TreeSet<Integer>();
for (int i=0;i < maxIteration;i++) {
int n1 = (int)Math.pow(2, i);
numbers.add(n1);
for (int j=0;j < maxIteration;j++) {
int n2 = (int)Math.pow(5, i);
numbers.add(n2);
for (Integer n: numbers) {
int n3 = n*n1;
numbers2.add(n3);
}
}
}
numbers.addAll(numbers2);
return new ArrayList<Integer>(numbers);
}
/**
* Based on the following fuzzy question # StackOverflow
* http://stackoverflow.com/questions/7571934/printing-numbers-of-the-form-2i-5j-in-increasing-order
*
*
* Result:
* 1 2 4 5 8 10 16 20 25 32 40 64 80 100 125 128 200 256 400 625 1000 2000 10000
*/
public static void main(String[] args) {
List<Integer> numbers = findIncreasingNumbers(5);
for (Integer i: numbers) {
System.out.print(i + " ");
}
}
}
If you can do it in O(nlogn), here's a simple solution:
Get an empty min-heap
Put 1 in the heap
while (you want to continue)
Get num from heap
print num
put num*2 and num*5 in the heap
There you have it. By min-heap, I mean min-heap
As a mathematician the first thing I always think about when looking at something like this is "will logarithms help?".
In this case it might.
If our series A is increasing then the series log(A) is also increasing. Since all terms of A are of the form 2^i.5^j then all members of the series log(A) are of the form i.log(2) + j.log(5)
We can then look at the series log(A)/log(2) which is also increasing and its elements are of the form i+j.(log(5)/log(2))
If we work out the i and j that generates the full ordered list for this last series (call it B) then that i and j will also generate the series A correctly.
This is just changing the nature of the problem but hopefully to one where it becomes easier to solve. At each step you can either increase i and decrease j or vice versa.
Looking at a few of the early changes you can make (which I will possibly refer to as transforms of i,j or just transorms) gives us some clues of where we are going.
Clearly increasing i by 1 will increase B by 1. However, given that log(5)/log(2) is approx 2.3 then increasing j by 1 while decreasing i by 2 will given an increase of just 0.3 . The problem then is at each stage finding the minimum possible increase in B for changes of i and j.
To do this I just kept a record as I increased of the most efficient transforms of i and j (ie what to add and subtract from each) to get the smallest possible increase in the series. Then applied whichever one was valid (ie making sure i and j don't go negative).
Since at each stage you can either decrease i or decrease j there are effectively two classes of transforms that can be checked individually. A new transform doesn't have to have the best overall score to be included in our future checks, just better than any other in its class.
To test my thougths I wrote a sort of program in LinqPad. Key things to note are that the Dump() method just outputs the object to screen and that the syntax/structure isn't valid for a real c# file. Converting it if you want to run it should be easy though.
Hopefully anything not explicitly explained will be understandable from the code.
void Main()
{
double C = Math.Log(5)/Math.Log(2);
int i = 0;
int j = 0;
int maxi = i;
int maxj = j;
List<int> outputList = new List<int>();
List<Transform> transforms = new List<Transform>();
outputList.Add(1);
while (outputList.Count<500)
{
Transform tr;
if (i==maxi)
{
//We haven't considered i this big before. Lets see if we can find an efficient transform by getting this many i and taking away some j.
maxi++;
tr = new Transform(maxi, (int)(-(maxi-maxi%C)/C), maxi%C);
AddIfWorthwhile(transforms, tr);
}
if (j==maxj)
{
//We haven't considered j this big before. Lets see if we can find an efficient transform by getting this many j and taking away some i.
maxj++;
tr = new Transform((int)(-(maxj*C)), maxj, (maxj*C)%1);
AddIfWorthwhile(transforms, tr);
}
//We have a set of transforms. We first find ones that are valid then order them by score and take the first (smallest) one.
Transform bestTransform = transforms.Where(x=>x.I>=-i && x.J >=-j).OrderBy(x=>x.Score).First();
//Apply transform
i+=bestTransform.I;
j+=bestTransform.J;
//output the next number in out list.
int value = GetValue(i,j);
//This line just gets it to stop when it overflows. I would have expected an exception but maybe LinqPad does magic with them?
if (value<0) break;
outputList.Add(value);
}
outputList.Dump();
}
public int GetValue(int i, int j)
{
return (int)(Math.Pow(2,i)*Math.Pow(5,j));
}
public void AddIfWorthwhile(List<Transform> list, Transform tr)
{
if (list.Where(x=>(x.Score<tr.Score && x.IncreaseI == tr.IncreaseI)).Count()==0)
{
list.Add(tr);
}
}
// Define other methods and classes here
public class Transform
{
public int I;
public int J;
public double Score;
public bool IncreaseI
{
get {return I>0;}
}
public Transform(int i, int j, double score)
{
I=i;
J=j;
Score=score;
}
}
I've not bothered looking at the efficiency of this but I strongly suspect its better than some other solutions because at each stage all I need to do is check my set of transforms - working out how many of these there are compared to "n" is non-trivial. It is clearly related since the further you go the more transforms there are but the number of new transforms becomes vanishingly small at higher numbers so maybe its just O(1). This O stuff always confused me though. ;-)
One advantage over other solutions is that it allows you to calculate i,j without needing to calculate the product allowing me to work out what the sequence would be without needing to calculate the actual number itself.
For what its worth after the first 230 nunmbers (when int runs out of space) I had 9 transforms to check each time. And given its only my total that overflowed I ran if for the first million results and got to i=5191 and j=354. The number of transforms was 23. The size of this number in the list is approximately 10^1810. Runtime to get to this level was approx 5 seconds.
P.S. If you like this answer please feel free to tell your friends since I spent ages on this and a few +1s would be nice compensation. Or in fact just comment to tell me what you think. :)
I'm sure everyone one's might have got the answer by now, but just wanted to give a direction to this solution..
It's a Ctrl C + Ctrl V from
http://www.careercup.com/question?id=16378662
void print(int N)
{
int arr[N];
arr[0] = 1;
int i = 0, j = 0, k = 1;
int numJ, numI;
int num;
for(int count = 1; count < N; )
{
numI = arr[i] * 2;
numJ = arr[j] * 5;
if(numI < numJ)
{
num = numI;
i++;
}
else
{
num = numJ;
j++;
}
if(num > arr[k-1])
{
arr[k] = num;
k++;
count++;
}
}
for(int counter = 0; counter < N; counter++)
{
printf("%d ", arr[counter]);
}
}
The question as put to me was to return an infinite set of solutions. I pondered the use of trees, but felt there was a problem with figuring out when to harvest and prune the tree, given an infinite number of values for i & j. I realized that a sieve algorithm could be used. Starting from zero, determine whether each positive integer had values for i and j. This was facilitated by turning answer = (2^i)*(2^j) around and solving for i instead. That gave me i = log2 (answer/ (5^j)). Here is the code:
class Program
{
static void Main(string[] args)
{
var startTime = DateTime.Now;
int potential = 0;
do
{
if (ExistsIandJ(potential))
Console.WriteLine("{0}", potential);
potential++;
} while (potential < 100000);
Console.WriteLine("Took {0} seconds", DateTime.Now.Subtract(startTime).TotalSeconds);
}
private static bool ExistsIandJ(int potential)
{
// potential = (2^i)*(5^j)
// 1 = (2^i)*(5^j)/potential
// 1/(2^1) = (5^j)/potential or (2^i) = potential / (5^j)
// i = log2 (potential / (5^j))
for (var j = 0; Math.Pow(5,j) <= potential; j++)
{
var i = Math.Log(potential / Math.Pow(5, j), 2);
if (i == Math.Truncate(i))
return true;
}
return false;
}
}

Find out which combinations of numbers in a set add up to a given total

I've been tasked with helping some accountants solve a common problem they have - given a list of transactions and a total deposit, which transactions are part of the deposit? For example, say I have this list of numbers:
1.00
2.50
3.75
8.00
And I know that my total deposit is 10.50, I can easily see that it's made up of the 8.00 and 2.50 transaction. However, given a hundred transactions and a deposit in the millions, it quickly becomes much more difficult.
In testing a brute force solution (which takes way too long to be practical), I had two questions:
With a list of about 60 numbers, it seems to find a dozen or more combinations for any total that's reasonable. I was expecting a single combination to satisfy my total, or maybe a few possibilities, but there always seem to be a ton of combinations. Is there a math principle that describes why this is? It seems that given a collection of random numbers of even a medium size, you can find a multiple combination that adds up to just about any total you want.
I built a brute force solution for the problem, but it's clearly O(n!), and quickly grows out of control. Aside from the obvious shortcuts (exclude numbers larger than the total themselves), is there a way to shorten the time to calculate this?
Details on my current (super-slow) solution:
The list of detail amounts is sorted largest to smallest, and then the following process runs recursively:
Take the next item in the list and see if adding it to your running total makes your total match the target. If it does, set aside the current chain as a match. If it falls short of your target, add it to your running total, remove it from the list of detail amounts, and then call this process again
This way it excludes the larger numbers quickly, cutting the list down to only the numbers it needs to consider. However, it's still n! and larger lists never seem to finish, so I'm interested in any shortcuts I might be able to take to speed this up - I suspect that even cutting 1 number out of the list would cut the calculation time in half.
Thanks for your help!
This special case of the Knapsack problem is called Subset Sum.
C# version
setup test:
using System;
using System.Collections.Generic;
public class Program
{
public static void Main(string[] args)
{
// subtotal list
List<double> totals = new List<double>(new double[] { 1, -1, 18, 23, 3.50, 8, 70, 99.50, 87, 22, 4, 4, 100.50, 120, 27, 101.50, 100.50 });
// get matches
List<double[]> results = Knapsack.MatchTotal(100.50, totals);
// print results
foreach (var result in results)
{
Console.WriteLine(string.Join(",", result));
}
Console.WriteLine("Done.");
Console.ReadKey();
}
}
code:
using System.Collections.Generic;
using System.Linq;
public class Knapsack
{
internal static List<double[]> MatchTotal(double theTotal, List<double> subTotals)
{
List<double[]> results = new List<double[]>();
while (subTotals.Contains(theTotal))
{
results.Add(new double[1] { theTotal });
subTotals.Remove(theTotal);
}
// if no subtotals were passed
// or all matched the Total
// return
if (subTotals.Count == 0)
return results;
subTotals.Sort();
double mostNegativeNumber = subTotals[0];
if (mostNegativeNumber > 0)
mostNegativeNumber = 0;
// if there aren't any negative values
// we can remove any values bigger than the total
if (mostNegativeNumber == 0)
subTotals.RemoveAll(d => d > theTotal);
// if there aren't any negative values
// and sum is less than the total no need to look further
if (mostNegativeNumber == 0 && subTotals.Sum() < theTotal)
return results;
// get the combinations for the remaining subTotals
// skip 1 since we already removed subTotals that match
for (int choose = 2; choose <= subTotals.Count; choose++)
{
// get combinations for each length
IEnumerable<IEnumerable<double>> combos = Combination.Combinations(subTotals.AsEnumerable(), choose);
// add combinations where the sum mathces the total to the result list
results.AddRange(from combo in combos
where combo.Sum() == theTotal
select combo.ToArray());
}
return results;
}
}
public static class Combination
{
public static IEnumerable<IEnumerable<T>> Combinations<T>(this IEnumerable<T> elements, int choose)
{
return choose == 0 ? // if choose = 0
new[] { new T[0] } : // return empty Type array
elements.SelectMany((element, i) => // else recursively iterate over array to create combinations
elements.Skip(i + 1).Combinations(choose - 1).Select(combo => (new[] { element }).Concat(combo)));
}
}
results:
100.5
100.5
-1,101.5
1,99.5
3.5,27,70
3.5,4,23,70
3.5,4,23,70
-1,1,3.5,27,70
1,3.5,4,22,70
1,3.5,4,22,70
1,3.5,8,18,70
-1,1,3.5,4,23,70
-1,1,3.5,4,23,70
1,3.5,4,4,18,70
-1,3.5,8,18,22,23,27
-1,3.5,4,4,18,22,23,27
Done.
If subTotals are repeated, there will appear to be duplicate results (the desired effect). In reality, you will probably want to use the subTotal Tupled with some ID, so you can relate it back to your data.
If I understand your problem correctly, you have a set of transactions, and you merely wish to know which of them could have been included in a given total. So if there are 4 possible transactions, then there are 2^4 = 16 possible sets to inspect. This problem is, for 100 possible transactions, the search space has 2^100 = 1267650600228229401496703205376 possible combinations to search over. For 1000 potential transactions in the mix, it grows to a total of
10715086071862673209484250490600018105614048117055336074437503883703510511249361224931983788156958581275946729175531468251871452856923140435984577574698574803934567774824230985421074605062371141877954182153046474983581941267398767559165543946077062914571196477686542167660429831652624386837205668069376
sets that you must test. Brute force will hardly be a viable solution on these problems.
Instead, use a solver that can handle knapsack problems. But even then, I'm not sure that you can generate a complete enumeration of all possible solutions without some variation of brute force.
There is a cheap Excel Add-in that solves this problem: SumMatch
The Excel Solver Addin as posted over on superuser.com has a great solution (if you have Excel) https://superuser.com/questions/204925/excel-find-a-subset-of-numbers-that-add-to-a-given-total
Its kind of like 0-1 Knapsack problem which is NP-complete and can be solved through dynamic programming in polynomial time.
http://en.wikipedia.org/wiki/Knapsack_problem
But at the end of the algorithm you also need to check that the sum is what you wanted.
Depending on your data you could first look at the cents portion of each transaction. Like in your initial example you know that 2.50 has to be part of the total because it is the only set of non-zero cent transactions which add to 50.
Not a super efficient solution but heres an implementation in coffeescript
combinations returns all possible combinations of the elements in list
combinations = (list) ->
permuations = Math.pow(2, list.length) - 1
out = []
combinations = []
while permuations
out = []
for i in [0..list.length]
y = ( 1 << i )
if( y & permuations and (y isnt permuations))
out.push(list[i])
if out.length <= list.length and out.length > 0
combinations.push(out)
permuations--
return combinations
and then find_components makes use of it to determine which numbers add up to total
find_components = (total, list) ->
# given a list that is assumed to have only unique elements
list_combinations = combinations(list)
for combination in list_combinations
sum = 0
for number in combination
sum += number
if sum is total
return combination
return []
Heres an example
list = [7.2, 3.3, 4.5, 6.0, 2, 4.1]
total = 7.2 + 2 + 4.1
console.log(find_components(total, list))
which returns [ 7.2, 2, 4.1 ]
#include <stdio.h>
#include <stdlib.h>
/* Takes at least 3 numbers as arguments.
* First number is desired sum.
* Find the subset of the rest that comes closest
* to the desired sum without going over.
*/
static long *elements;
static int nelements;
/* A linked list of some elements, not necessarily all */
/* The list represents the optimal subset for elements in the range [index..nelements-1] */
struct status {
long sum; /* sum of all the elements in the list */
struct status *next; /* points to next element in the list */
int index; /* index into elements array of this element */
};
/* find the subset of elements[startingat .. nelements-1] whose sum is closest to but does not exceed desiredsum */
struct status *reportoptimalsubset(long desiredsum, int startingat) {
struct status *sumcdr = NULL;
struct status *sumlist = NULL;
/* sum of zero elements or summing to zero */
if (startingat == nelements || desiredsum == 0) {
return NULL;
}
/* optimal sum using the current element */
/* if current elements[startingat] too big, it won't fit, don't try it */
if (elements[startingat] <= desiredsum) {
sumlist = malloc(sizeof(struct status));
sumlist->index = startingat;
sumlist->next = reportoptimalsubset(desiredsum - elements[startingat], startingat + 1);
sumlist->sum = elements[startingat] + (sumlist->next ? sumlist->next->sum : 0);
if (sumlist->sum == desiredsum)
return sumlist;
}
/* optimal sum not using current element */
sumcdr = reportoptimalsubset(desiredsum, startingat + 1);
if (!sumcdr) return sumlist;
if (!sumlist) return sumcdr;
return (sumcdr->sum < sumlist->sum) ? sumlist : sumcdr;
}
int main(int argc, char **argv) {
struct status *result = NULL;
long desiredsum = strtol(argv[1], NULL, 10);
nelements = argc - 2;
elements = malloc(sizeof(long) * nelements);
for (int i = 0; i < nelements; i++) {
elements[i] = strtol(argv[i + 2], NULL , 10);
}
result = reportoptimalsubset(desiredsum, 0);
if (result)
printf("optimal subset = %ld\n", result->sum);
while (result) {
printf("%ld + ", elements[result->index]);
result = result->next;
}
printf("\n");
}
Best to avoid use of floats and doubles when doing arithmetic and equality comparisons btw.

Roulette Selection in Genetic Algorithms

Can anyone provide some pseudo code for a roulette selection function? How would I implement this:
I don't really understand how to read this math notation. I never took any probability or statistics.
It's been a few years since i've done this myself, however the following pseudo code was found easily enough on google.
for all members of population
sum += fitness of this individual
end for
for all members of population
probability = sum of probabilities + (fitness / sum)
sum of probabilities += probability
end for
loop until new population is full
do this twice
number = Random between 0 and 1
for all members of population
if number > probability but less than next probability
then you have been selected
end for
end
create offspring
end loop
The site where this came from can be found here if you need further details.
Lots of correct solutions already, but I think this code is clearer.
def select(fs):
p = random.uniform(0, sum(fs))
for i, f in enumerate(fs):
if p <= 0:
break
p -= f
return i
In addition, if you accumulate the fs, you can produce a more efficient solution.
cfs = [sum(fs[:i+1]) for i in xrange(len(fs))]
def select(cfs):
return bisect.bisect_left(cfs, random.uniform(0, cfs[-1]))
This is both faster and it's extremely concise code. STL in C++ has a similar bisection algorithm available if that's the language you're using.
The pseudocode posted contained some unclear elements, and it adds the complexity of generating offspring in stead of performing pure selection. Here is a simple python implementation of that pseudocode:
def roulette_select(population, fitnesses, num):
""" Roulette selection, implemented according to:
<http://stackoverflow.com/questions/177271/roulette
-selection-in-genetic-algorithms/177278#177278>
"""
total_fitness = float(sum(fitnesses))
rel_fitness = [f/total_fitness for f in fitnesses]
# Generate probability intervals for each individual
probs = [sum(rel_fitness[:i+1]) for i in range(len(rel_fitness))]
# Draw new population
new_population = []
for n in xrange(num):
r = rand()
for (i, individual) in enumerate(population):
if r <= probs[i]:
new_population.append(individual)
break
return new_population
This is called roulette-wheel selection via stochastic acceptance:
/// \param[in] f_max maximum fitness of the population
///
/// \return index of the selected individual
///
/// \note Assuming positive fitness. Greater is better.
unsigned rw_selection(double f_max)
{
for (;;)
{
// Select randomly one of the individuals
unsigned i(random_individual());
// The selection is accepted with probability fitness(i) / f_max
if (uniform_random_01() < fitness(i) / f_max)
return i;
}
}
The average number of attempts needed for a single selection is:
τ = fmax / avg(f)
fmax is the maximum fitness of the population
avg(f) is the average fitness
τ doesn't depend explicitly on the number of individual in the population (N), but the ratio can change with N.
However in many application (where the fitness remains bounded and the average fitness doesn't diminish to 0 for increasing N) τ doesn't increase unboundedly with N and thus a typical complexity of this algorithm is O(1) (roulette wheel selection using search algorithms has O(N) or O(log N) complexity).
The probability distribution of this procedure is indeed the same as in the classical roulette-wheel selection.
For further details see:
Roulette-wheel selection via stochastic acceptance (Adam Liposki, Dorota Lipowska - 2011)
Here is some code in C :
// Find the sum of fitnesses. The function fitness(i) should
//return the fitness value for member i**
float sumFitness = 0.0f;
for (int i=0; i < nmembers; i++)
sumFitness += fitness(i);
// Get a floating point number in the interval 0.0 ... sumFitness**
float randomNumber = (float(rand() % 10000) / 9999.0f) * sumFitness;
// Translate this number to the corresponding member**
int memberID=0;
float partialSum=0.0f;
while (randomNumber > partialSum)
{
partialSum += fitness(memberID);
memberID++;
}
**// We have just found the member of the population using the roulette algorithm**
**// It is stored in the "memberID" variable**
**// Repeat this procedure as many times to find random members of the population**
From the above answer, I got the following, which was clearer to me than the answer itself.
To give an example:
Random(sum) :: Random(12)
Iterating through the population, we check the following: random < sum
Let us chose 7 as the random number.
Index | Fitness | Sum | 7 < Sum
0 | 2 | 2 | false
1 | 3 | 5 | false
2 | 1 | 6 | false
3 | 4 | 10 | true
4 | 2 | 12 | ...
Through this example, the most fit (Index 3) has the highest percentage of being chosen (33%); as the random number only has to land within 6->10, and it will be chosen.
for (unsigned int i=0;i<sets.size();i++) {
sum += sets[i].eval();
}
double rand = (((double)rand() / (double)RAND_MAX) * sum);
sum = 0;
for (unsigned int i=0;i<sets.size();i++) {
sum += sets[i].eval();
if (rand < sum) {
//breed i
break;
}
}
Prof. Thrun of Stanford AI lab also presented a fast(er?) re-sampling code in python during his CS373 of Udacity. Google search result led to the following link:
http://www.udacity-forums.com/cs373/questions/20194/fast-resampling-algorithm
Hope this helps
Here's a compact java implementation I wrote recently for roulette selection, hopefully of use.
public static gene rouletteSelection()
{
float totalScore = 0;
float runningScore = 0;
for (gene g : genes)
{
totalScore += g.score;
}
float rnd = (float) (Math.random() * totalScore);
for (gene g : genes)
{
if ( rnd>=runningScore &&
rnd<=runningScore+g.score)
{
return g;
}
runningScore+=g.score;
}
return null;
}
Roulette Wheel Selection in MatLab:
TotalFitness=sum(Fitness);
ProbSelection=zeros(PopLength,1);
CumProb=zeros(PopLength,1);
for i=1:PopLength
ProbSelection(i)=Fitness(i)/TotalFitness;
if i==1
CumProb(i)=ProbSelection(i);
else
CumProb(i)=CumProb(i-1)+ProbSelection(i);
end
end
SelectInd=rand(PopLength,1);
for i=1:PopLength
flag=0;
for j=1:PopLength
if(CumProb(j)<SelectInd(i) && CumProb(j+1)>=SelectInd(i))
SelectedPop(i,1:IndLength)=CurrentPop(j+1,1:IndLength);
flag=1;
break;
end
end
if(flag==0)
SelectedPop(i,1:IndLength)=CurrentPop(1,1:IndLength);
end
end
Okay, so there are 2 methods for roulette wheel selection implementation: Usual and Stochastic Acceptance one.
Usual algorithm:
# there will be some amount of repeating organisms here.
mating_pool = []
all_organisms_in_population.each do |organism|
organism.fitness.times { mating_pool.push(organism) }
end
# [very_fit_organism, very_fit_organism, very_fit_organism, not_so_fit_organism]
return mating_pool.sample #=> random, likely fit, parent!
Stochastic Acceptance algorithm:
max_fitness_in_population = all_organisms_in_population.sort_by(:fitness)[0]
loop do
random_parent = all_organisms_in_population.sample
probability = random_parent.fitness/max_fitness_in_population * 100
# if random_parent's fitness is 90%,
# it's very likely that rand(100) is smaller than it.
if rand(100) < probability
return random_parent #=> random, likely fit, parent!
else
next #=> or let's keep on searching for one.
end
end
You can choose either, they will be returning identical results.
Useful resources:
http://natureofcode.com/book/chapter-9-the-evolution-of-code - a beginner-friendly and clear chapter on genetic algorithms. explains roulette wheel selection as a bucket of wooden letters (the more As you put in - the great is the chance of picking an A, Usual algorithm).
https://en.wikipedia.org/wiki/Fitness_proportionate_selection - describes Stochastic Acceptance algorithm.
Based on my research ,Here is another implementation in C# if there is a need for it:
//those with higher fitness get selected wit a large probability
//return-->individuals with highest fitness
private int RouletteSelection()
{
double randomFitness = m_random.NextDouble() * m_totalFitness;
int idx = -1;
int mid;
int first = 0;
int last = m_populationSize -1;
mid = (last - first)/2;
// ArrayList's BinarySearch is for exact values only
// so do this by hand.
while (idx == -1 && first <= last)
{
if (randomFitness < (double)m_fitnessTable[mid])
{
last = mid;
}
else if (randomFitness > (double)m_fitnessTable[mid])
{
first = mid;
}
mid = (first + last)/2;
// lies between i and i+1
if ((last - first) == 1)
idx = last;
}
return idx;
}
This Swift 4 array extension implements weighted random selection, a.k.a Roulette selection from its elements:
public extension Array where Element == Double {
/// Consider the elements as weight values and return a weighted random selection by index.
/// a.k.a Roulette wheel selection.
func weightedRandomIndex() -> Int {
var selected: Int = 0
var total: Double = self[0]
for i in 1..<self.count { // start at 1
total += self[i]
if( Double.random(in: 0...1) <= (self[i] / total)) { selected = i }
}
return selected
}
}
For example given the two element array:
[0.9, 0.1]
weightedRandomIndex() will return zero 90% of the time and one 10% of the time.
Here is a more complete test:
let weights = [0.1, 0.7, 0.1, 0.1]
var results = [Int:Int]()
let n = 100000
for _ in 0..<n {
let index = weights.weightedRandomIndex()
results[index] = results[index, default:0] + 1
}
for (key,val) in results.sorted(by: { a,b in weights[a.key] < weights[b.key] }) {
print(weights[key], Double(val)/Double(n))
}
output:
0.1 0.09906
0.1 0.10126
0.1 0.09876
0.7 0.70092
This answer is basically the same as Andrew Mao's answer here:
https://stackoverflow.com/a/15582983/74975
Here is the code in python. This code can also handle the negative value of fitness.
from numpy import min, sum, ptp, array
from numpy.random import uniform
list_fitness1 = array([-12, -45, 0, 72.1, -32.3])
list_fitness2 = array([0.5, 6.32, 988.2, 1.23])
def get_index_roulette_wheel_selection(list_fitness=None):
""" It can handle negative also. Make sure your list fitness is 1D-numpy array"""
scaled_fitness = (list_fitness - min(list_fitness)) / ptp(list_fitness)
minimized_fitness = 1.0 - scaled_fitness
total_sum = sum(minimized_fitness)
r = uniform(low=0, high=total_sum)
for idx, f in enumerate(minimized_fitness):
r = r + f
if r > total_sum:
return idx
get_index_roulette_wheel_selection(list_fitness1)
get_index_roulette_wheel_selection(list_fitness2)
Make sure your fitness list is 1D-numpy array
Scaled the fitness list to the range [0, 1]
Transform maximum problem to minimum problem by 1.0 - scaled_fitness_list
Random a number between 0 and sum(minimizzed_fitness_list)
Keep adding element in minimized fitness list until we get the value greater than the total sum
You can see if the fitness is small --> it has bigger value in minimized_fitness --> It has a bigger chance to add and make the value greater than the total sum.
I wrote a version in C# and am really looking for confirmation that it is indeed correct:
(roulette_selector is a random number which will be in the range 0.0 to 1.0)
private Individual Select_Roulette(double sum_fitness)
{
Individual ret = new Individual();
bool loop = true;
while (loop)
{
//this will give us a double within the range 0.0 to total fitness
double slice = roulette_selector.NextDouble() * sum_fitness;
double curFitness = 0.0;
foreach (Individual ind in _generation)
{
curFitness += ind.Fitness;
if (curFitness >= slice)
{
loop = false;
ret = ind;
break;
}
}
}
return ret;
}

Resources