Random sample of values with the specified resulting probabilities - algorithm

Imagine we have four symbols - 'a', 'b', 'c', 'd'. We also have four given probabilities of those symbols appearing in the function output - P1, P2, P3, P4 (the sum of which is equal to 1). How would one implement a function which would generate a random sample of three of those symbols, such is that the resulting symbols are present in it with those specified probabilities?
Example: 'a', 'b', 'c' and 'd' have the probabilities of 9/30, 8/30, 7/30 and 6/30 respectively. The function outputs various random samples of any three out of those four symbols: 'abc', 'dca', 'bad' and so on. We run this function many times, counting the amount of times each of the symbols is encountered in its output. At the end, the value of counts stored for 'a' divided by the total amount of symbols output should converge to 9/30, for 'b' to 8/30, for 'c' to 7/30, and for 'd' to 6/30.
E.g. the function generates 10 outputs:
adc
dab
bca
dab
dba
cab
dcb
acd
cab
abc
which out of 30 symbols contains 9 of 'a', 8 of 'b', 7 of 'c' and 6 of 'd'. This is an idealistic example, of course, as the values would only converge when the number of samples is much larger - but it should hopefully convey the idea.
Obviously, this all is only possible when neither probability is larger than 1/3, since each single sample output would always contain three distinct symbols. It is ok for the function to enter an infinite loop or otherwise behave erratically if it's impossible to satisfy the values provided.
Note: the function should obviously use an RNG, but should otherwise be stateless. Each new invocation should be independent from any of the previous ones, except for the RNG state.
EDIT: Even though the description mentions choosing 3 out of 4 values, ideally the algorithm should be able to cope with any sample size.

Your problem is underdetermined.
If we assign a probability to each string of three letters that we allow, p(abc), p(abd), p(acd) etc xtc we can gernerate a series of equations
eqn1: p(abc) + p(abd) + ... others with a "a" ... = p1
...
...
eqn2: p(abd) + p(acd) + ... others with a "d" ... = p4
This has more unknowns than equations, so many ways of solving it. Once a solution is found, by whatever method you choose (use the simplex algorithm if you are me), sample from the probabilities of each string using the roulette method that #alestanis describes.
from numpy import *
# using cvxopt-1.1.5
from cvxopt import matrix, solvers
###########################
# Functions to do some parts
# function to find all valid outputs
def perms(alphabet, length):
if length == 0:
yield ""
return
for i in range(len(alphabet)):
val1 = alphabet[i]
for val2 in perms(alphabet[:i]+alphabet[i+1:], length-1):
yield val1 + val2
# roulette sampler
def roulette_sampler(values, probs):
# Create cumulative prob distro
probs_cum = [sum(probs[:i+1]) for i in range(n_strings)]
def fun():
r = random.rand()
for p,s in zip(probs_cum, values):
if r < p:
return s
# in case of rounding error
return values[-1]
return fun
############################
# Main Part
# create list of all valid strings
alphabet = "abcd"
string_length = 3
alpha_probs = [string_length*x/30. for x in range(9,5,-1)]
# show probs
for a,p in zip(alphabet, alpha_probs):
print "p("+a+") =",p
# all valid outputs for this particular case
strings = [perm for perm in perms(alphabet, string_length)]
n_strings = len(strings)
# constraints from probabilities p(abc) + p(abd) ... = p(a)
contains = array([[1. if s.find(a) >= 0 else 0. for a in alphabet] for s in strings])
#both = concatenate((contains,wons), axis=1).T # hacky, but whatever
#A = matrix(both)
#b = matrix(alpha_probs + [1.])
A = matrix(contains.T)
b = matrix(alpha_probs)
#also need to constrain to [0,1]
wons = array([[1. for s in strings]])
G = matrix(concatenate((eye(n_strings),wons,-eye(n_strings),-wons)))
h = matrix(concatenate((ones(n_strings+1),zeros(n_strings+1))))
## target matricies for approx KL divergence
# uniform prob over valid outputs
u = 1./len(strings)
P = matrix(eye(n_strings))
q = -0.5*u*matrix(ones(n_strings))
# will minimise p^2 - pq for each p val equally
# Do convex optimisation
sol = solvers.qp(P,q,G,h,A,b)
probs = array(sol['x'])
# Print ouput
for s,p in zip(strings,probs):
print "p("+s+") =",p
checkprobs = [0. for char in alphabet]
for a,i in zip(alphabet, range(len(alphabet))):
for s,p in zip(strings,probs):
if s.find(a) > -1:
checkprobs[i] += p
print "p("+a+") =",checkprobs[i]
print "total =",sum(probs)
# Create the sampling function
rndstring = roulette_sampler(strings, probs)
###################
# Verify
print "sampling..."
test_n = 1000
output = [rndstring() for i in xrange(test_n)]
# find which one it is
sampled_freqs = []
for char in alphabet:
n = 0
for val in output:
if val.find(char) > -1:
n += 1
sampled_freqs += [n]
print "plotting histogram..."
import matplotlib.pyplot as plt
plt.bar(range(0,len(alphabet)),array(sampled_freqs)/float(test_n), width=0.5)
plt.show()
EDIT: Python code

I think this is a pretty interesting problem. I don't know the general solution, but it's easy enough to solve in the case of samples of size n-1 (if there is a solution), since there are exactly n possible samples, each of which corresponds to the absence of one of the elements.
Suppose we are seeking Fa = 9/30, Fb = 8/30, Fc = 7/30, Fd = 6/30 in samples of size 3 from a universe of size 4, as in the OP. We can translate each of those frequencies directly into a frequency of samples by selecting the samples which do not contain the given object. For example, we wish 9/30 of the selected objects to be a's; we cannot have more than one a in a sample, and we always have three symbols in a sample; consequently, 9/10 of the samples must contain a and 1/10 cannot contain a. But there is only one possible sample which doesn't contain a: bcd. So 10% of the samples must be bcd. Similarly, 20% must be acd; 30% abd and 40% abc. (Or, more generally, Fā = 1 - (n-1)Fa where Fā is the frequency of the (unique) sample which does not include a)
I can't help thinking that this observation combined with one of the classic ways of generating unique samples can solve the general problem. But I don't have that solution. For what it's worth, the algorithm I'm thinking of is the following:
To select a random sample of size k out of a universe U of n objects:
1) Set needed = k; available = n.
2) For each element in U, select a random number in the range [0, 1).
3) If the random number is less than k/n:
3a) Add the element to the sample.
3b) Decrement needed by 1. If it reaches 0, we're finished.
4) Decrement available, and continue with the next element in U.
So my idea is that it should be possible to manipulate the frequency of element by changing the threshold in step 3, making it somehow a function of the desired frequency of the corresponding element.

Assuming that the length of a word is always one less than the number of symbols, the following C# code does the job:
using System;
using System.Collections.Generic;
using System.Linq;
using MathNet.Numerics.Distributions;
namespace RandomSymbols
{
class Program
{
static void Main(string[] args)
{
// Sample case: Four symbols with the following distribution, and 10000 trials
double[] distribution = { 9.0/30, 8.0/30, 7.0/30, 6.0/30 };
int trials = 10000;
// Create an array containing all of the symbols
char[] symbols = Enumerable.Range('a', distribution.Length).Select(s => (char)s).ToArray();
// We're assuming that the word length is always one less than the number of symbols
int wordLength = symbols.Length - 1;
// Calculate the probability of each symbol being excluded from a given word
double[] excludeDistribution = Array.ConvertAll(distribution, p => 1 - p * wordLength);
// Create a random variable using the MathNet.Numerics library
var randVar = new Categorical(excludeDistribution);
var random = new Random();
randVar.RandomSource = random;
// We'll store all of the words in an array
string[] words = new string[trials];
for (int t = 0; t < trials; t++)
{
// Start with a word containing all of the symbols
var word = new List<char>(symbols);
// Remove one of the symbols
word.RemoveAt(randVar.Sample());
// Randomly permute the remainder
for (int i = 0; i < wordLength; i++)
{
int swapIndex = random.Next(wordLength);
char temp = word[swapIndex];
word[swapIndex] = word[i];
word[i] = temp;
}
// Store the word
words[t] = new string(word.ToArray());
}
// Display words
Array.ForEach(words, w => Console.WriteLine(w));
// Display histogram
Array.ForEach(symbols, s => Console.WriteLine("{0}: {1}", s, words.Count(w => w.Contains(s))));
}
}
}
Update: The following is a C implementation of the method that rici outlined. The tricky part is calculating the thresholds that he mentions, which I did with recursion.
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
// ****** Change the following for different symbol distributions, word lengths, and number of trials ******
double targetFreqs[] = {10.0/43, 9.0/43, 8.0/43, 7.0/43, 6.0/43, 2.0/43, 1.0/43 };
const int WORDLENGTH = 4;
const int TRIALS = 1000000;
// *********************************************************************************************************
const int SYMBOLCOUNT = sizeof(targetFreqs) / sizeof(double);
double inclusionProbs[SYMBOLCOUNT];
double probLeftToIncludeTable[SYMBOLCOUNT][SYMBOLCOUNT];
// Calculates the probability that there will be n symbols left to be included when we get to the ith symbol.
double probLeftToInclude(int i, int n)
{
if (probLeftToIncludeTable[i][n] == -1)
{
// If this is the first symbol, then the number of symbols left to be included is simply the word length.
if (i == 0)
{
probLeftToIncludeTable[i][n] = (n == WORDLENGTH ? 1.0 : 0.0);
}
else
{
// Calculate the probability based on the previous symbol's probabilities.
// To get to a point where there are n symbols left to be included, either there were n+1 symbols left
// when we were considering that previous symbol and we included it, leaving n,
// or there were n symbols left and we didn't included it, also leaving n.
// We have to take into account that the previous symbol may have been manditorily included.
probLeftToIncludeTable[i][n] = probLeftToInclude(i-1, n+1) * (n == SYMBOLCOUNT-i ? 1.0 : inclusionProbs[i-1])
+ probLeftToInclude(i-1, n) * (n == 0 ? 1.0 : 1 - inclusionProbs[i-1]);
}
}
return probLeftToIncludeTable[i][n];
}
// Calculates the probability that the ith symbol won't *have* to be included or *have* to be excluded.
double probInclusionIsOptional(int i)
{
// The probability that inclusion is optional is equal to 1.0
// minus the probability that none of the remaining symbols can be included
// minus the probability that all of the remaining symbols must be included.
return 1.0 - probLeftToInclude(i, 0) - probLeftToInclude(i, SYMBOLCOUNT - i);
}
// Calculates the probability with which the ith symbol should be included, assuming that
// it doesn't *have* to be included or *have* to be excluded.
double inclusionProb(int i)
{
// The following is derived by simple algebra:
// Unconditional probability = (1.0 * probability that it must be included) + (inclusionProb * probability that inclusion is optional)
// therefore...
// inclusionProb = (Unconditional probability - probability that it must be included) / probability that inclusion is optional
return (targetFreqs[i]*WORDLENGTH - probLeftToInclude(i, SYMBOLCOUNT - i)) / probInclusionIsOptional(i);
}
int main(int argc, char* argv[])
{
srand(time(NULL));
// Initialize inclusion probabilities
for (int i=0; i<SYMBOLCOUNT; i++)
for (int j=0; j<SYMBOLCOUNT; j++)
probLeftToIncludeTable[i][j] = -1.0;
// Calculate inclusion probabilities
for (int i=0; i<SYMBOLCOUNT; i++)
{
inclusionProbs[i] = inclusionProb(i);
}
// Histogram
int histogram[SYMBOLCOUNT];
for (int i=0; i<SYMBOLCOUNT; i++)
{
histogram[i] = 0;
}
// Scratchpad for building our words
char word[WORDLENGTH+1];
word[WORDLENGTH] = '\0';
// Run trials
for (int t=0; t<TRIALS; t++)
{
int included = 0;
// Build the word by including or excluding symbols according to the problem constraints
// and the probabilities in inclusionProbs[].
for (int i=0; i<SYMBOLCOUNT && included<WORDLENGTH; i++)
{
if (SYMBOLCOUNT - i == WORDLENGTH - included // if we have to include this symbol
|| (double)rand()/(double)RAND_MAX < inclusionProbs[i]) // or if we get a lucky roll of the dice
{
word[included++] = 'a' + i;
histogram[i]++;
}
}
// Randomly permute the word
for (int i=0; i<WORDLENGTH; i++)
{
int swapee = rand() % WORDLENGTH;
char temp = word[swapee];
word[swapee] = word[i];
word[i] = temp;
}
// Uncomment the following to show each word
// printf("%s\r\n", word);
}
// Show the histogram
for (int i=0; i<SYMBOLCOUNT; i++)
{
printf("%c: target=%d, actual=%d\r\n", 'a'+i, (int)(targetFreqs[i]*WORDLENGTH*TRIALS), histogram[i]);
}
return 0;
}

To do this you have to use a temporary array storing the cumulated sum of your probabilities.
In your example, probabilities are 9/30, 8/30, 7/30 and 6/30 respectively.
You should then have an array:
values = {'a', 'b', 'c', 'd'}
proba = {9/30, 17/30, 24/30, 1}
Then you pick a random number r in [0, 1] and do like this:
char chooseRandom() {
int i = 0;
while (r > proba[i])
++i;
return values[i];
}

Related

Generate a number different from the numbers in the array

There is an array (greater than 1000 elements space) with 1000 large numbers (can be 64 bit numbers as well). The numbers in the array may not be necessarily sorted.
We have to generate a unique number at 1001th position that is different from the previous 1000.
Justify the approach used is the best.
My answer (don't know to what extent this was correct):
Sort the numbers, and start from the 0 position. The number that is at 1000th position + 1 is the required number.
Better suggestions for this?
Create an auxiliary array of 1001 elements. Set all these to 1 (or true or Y or whatever you choose). Run through the main array, if you find a number in the range 1..1000 then 0 out (or falsify some other how) the corresponding element in the auxiliary array. At the end the first element in the auxiliary array which is not 0 (or false) corresponds to a number which is not in the main array.
This is simple, and, I think, O(n) in time complexity, where n is the number of elements in the main array.
unsigned ii,slot;
unsigned array [ NNN ];
/* allocate a histogram */
#define XXX (NNN+1);
unsigned histogram [XXX];
memset(histogram, 0, sizeof histogram);
for (ii=0; ii < NNN; ii++) {
slot = array [ii ] % XXX;
histogram[slot] += 1;
}
for (slot=0; slot < NNN; slot++) {
if ( !histogram[slot]) break;
}
/* Now, slot + k * XXX will be a
** number that does not occur in the original array */
Note: this is similar to High performance Mark, but at least I typed in the code ...
If you sort your array, you have three possibilities for a unique number:
array[999]+1, if array[999] is not equal to INT_MAX
array[0]-1, if array[0] is not equal to INT_MIN
a number between array[i] and array[i+1], if array[i+1]-array[i]>1 (0<=i<=998). Notice that if the two previous tries have failed, then it is guaranteed that there is a number between two elements in your array.
Notice that this solution will also work for the 1002th, 1003th, and so on.
An attempt at a clumsy c# implementation
public class Test
{
public List<int> Sequence { get; set; }
public void GenerateFirstSequence()
{
Sequence = new List<int>();
for (var i = 0; i < 1000; i++)
{
var x = new Random().Next(0, int.MaxValue);
while (Sequence.Contains(x))
{
x = new Random().Next(0, int.MaxValue);
}
Sequence.Add(x);
}
}
public int GetNumberNotInSequence()
{
var number = Sequence.OrderBy(x => x).Max();
var mustRedefine = number == int.MaxValue && Sequence.Contains(number);
if (mustRedefine)
{
while (Sequence.Contains(number))
{
number = number - 1;
if (!Sequence.Contains(number))
return number;
}
}
return number + 1;
}
}
I have some thoughts on this problem:
You could create a hash table H, which contain 1000 elements. Suppose your array named A, and for each element, we have the reminder by 1000: m[i] = A[i] % 1000.
If there is a conflict between A[i] and A[j], that A[i] % 1000 = A[j] % 1000. That is to say, there must exist an index k, that no element's reminder by 1000 equals to k, then k is the number you are going to get.
If there is no conflict at all, just pick H[1] + 1000 as your result.
The complexity of this algorithm is O(l), in which l indicates the original list size, in the example, l = 1000

Printing numbers of the form 2^i * 5^j in increasing order

How do you print numbers of form 2^i * 5^j in increasing order.
For eg:
1, 2, 4, 5, 8, 10, 16, 20
This is actually a very interesting question, especially if you don't want this to be N^2 or NlogN complexity.
What I would do is the following:
Define a data structure containing 2 values (i and j) and the result of the formula.
Define a collection (e.g. std::vector) containing this data structures
Initialize the collection with the value (0,0) (the result is 1 in this case)
Now in a loop do the following:
Look in the collection and take the instance with the smallest value
Remove it from the collection
Print this out
Create 2 new instances based on the instance you just processed
In the first instance increment i
In the second instance increment j
Add both instances to the collection (if they aren't in the collection yet)
Loop until you had enough of it
The performance can be easily tweaked by choosing the right data structure and collection.
E.g. in C++, you could use an std::map, where the key is the result of the formula, and the value is the pair (i,j). Taking the smallest value is then just taking the first instance in the map (*map.begin()).
I quickly wrote the following application to illustrate it (it works!, but contains no further comments, sorry):
#include <math.h>
#include <map>
#include <iostream>
typedef __int64 Integer;
typedef std::pair<Integer,Integer> MyPair;
typedef std::map<Integer,MyPair> MyMap;
Integer result(const MyPair &myPair)
{
return pow((double)2,(double)myPair.first) * pow((double)5,(double)myPair.second);
}
int main()
{
MyMap myMap;
MyPair firstValue(0,0);
myMap[result(firstValue)] = firstValue;
while (true)
{
auto it=myMap.begin();
if (it->first < 0) break; // overflow
MyPair myPair = it->second;
std::cout << it->first << "= 2^" << myPair.first << "*5^" << myPair.second << std::endl;
myMap.erase(it);
MyPair pair1 = myPair;
++pair1.first;
myMap[result(pair1)] = pair1;
MyPair pair2 = myPair;
++pair2.second;
myMap[result(pair2)] = pair2;
}
}
This is well suited to a functional programming style. In F#:
let min (a,b)= if(a<b)then a else b;;
type stream (current, next)=
member this.current = current
member this.next():stream = next();;
let rec merge(a:stream,b:stream)=
if(a.current<b.current) then new stream(a.current, fun()->merge(a.next(),b))
else new stream(b.current, fun()->merge(a,b.next()));;
let rec Squares(start) = new stream(start,fun()->Squares(start*2));;
let rec AllPowers(start) = new stream(start,fun()->merge(Squares(start*2),AllPowers(start*5)));;
let Results = AllPowers(1);;
Works well with Results then being a stream type with current value and a next method.
Walking through it:
I define min for completenes.
I define a stream type to have a current value and a method to return a new string, essentially head and tail of a stream of numbers.
I define the function merge, which takes the smaller of the current values of two streams and then increments that stream. It then recurses to provide the rest of the stream. Essentially, given two streams which are in order, it will produce a new stream which is in order.
I define squares to be a stream increasing in powers of 2.
AllPowers takes the start value and merges the stream resulting from all squares at this number of powers of 5. it with the stream resulting from multiplying it by 5, since these are your only two options. You effectively are left with a tree of results
The result is merging more and more streams, so you merge the following streams
1, 2, 4, 8, 16, 32...
5, 10, 20, 40, 80, 160...
25, 50, 100, 200, 400...
.
.
.
Merging all of these turns out to be fairly efficient with tail recursio and compiler optimisations etc.
These could be printed to the console like this:
let rec PrintAll(s:stream)=
if (s.current > 0) then
do System.Console.WriteLine(s.current)
PrintAll(s.next());;
PrintAll(Results);
let v = System.Console.ReadLine();
Similar things could be done in any language which allows for recursion and passing functions as values (it's only a little more complex if you can't pass functions as variables).
For an O(N) solution, you can use a list of numbers found so far and two indexes: one representing the next number to be multiplied by 2, and the other the next number to be multiplied by 5. Then in each iteration you have two candidate values to choose the smaller one from.
In Python:
numbers = [1]
next_2 = 0
next_5 = 0
for i in xrange(100):
mult_2 = numbers[next_2]*2
mult_5 = numbers[next_5]*5
if mult_2 < mult_5:
next = mult_2
next_2 += 1
else:
next = mult_5
next_5 += 1
# The comparison here is to avoid appending duplicates
if next > numbers[-1]:
numbers.append(next)
print numbers
So we have two loops, one incrementing i and second one incrementing j starting both from zero, right? (multiply symbol is confusing in the title of the question)
You can do something very straightforward:
Add all items in an array
Sort the array
Or you need an other solution with more math analysys?
EDIT: More smart solution by leveraging similarity with Merge Sort problem
If we imagine infinite set of numbers of 2^i and 5^j as two independent streams/lists this problem looks very the same as well known Merge Sort problem.
So solution steps are:
Get two numbers one from the each of streams (of 2 and of 5)
Compare
Return smallest
get next number from the stream of the previously returned smallest
and that's it! ;)
PS: Complexity of Merge Sort always is O(n*log(n))
I visualize this problem as a matrix M where M(i,j) = 2^i * 5^j. This means that both the rows and columns are increasing.
Think about drawing a line through the entries in increasing order, clearly beginning at entry (1,1). As you visit entries, the row and column increasing conditions ensure that the shape formed by those cells will always be an integer partition (in English notation). Keep track of this partition (mu = (m1, m2, m3, ...) where mi is the number of smaller entries in row i -- hence m1 >= m2 >= ...). Then the only entries that you need to compare are those entries which can be added to the partition.
Here's a crude example. Suppose you've visited all the xs (mu = (5,3,3,1)), then you need only check the #s:
x x x x x #
x x x #
x x x
x #
#
Therefore the number of checks is the number of addable cells (equivalently the number of ways to go up in Bruhat order if you're of a mind to think in terms of posets).
Given a partition mu, it's easy to determine what the addable states are. Image an infinite string of 0s following the last positive entry. Then you can increase mi by 1 if and only if m(i-1) > mi.
Back to the example, for mu = (5,3,3,1) we can increase m1 (6,3,3,1) or m2 (5,4,3,1) or m4 (5,3,3,2) or m5 (5,3,3,1,1).
The solution to the problem then finds the correct sequence of partitions (saturated chain). In pseudocode:
mu = [1,0,0,...,0];
while (/* some terminate condition or go on forever */) {
minNext = 0;
nextCell = [];
// look through all addable cells
for (int i=0; i<mu.length; ++i) {
if (i==0 or mu[i-1]>mu[i]) {
// check for new minimum value
if (minNext == 0 or 2^i * 5^(mu[i]+1) < minNext) {
nextCell = i;
minNext = 2^i * 5^(mu[i]+1)
}
}
}
// print next largest entry and update mu
print(minNext);
mu[i]++;
}
I wrote this in Maple stopping after 12 iterations:
1, 2, 4, 5, 8, 10, 16, 20, 25, 32, 40, 50
and the outputted sequence of cells added and got this:
1 2 3 5 7 10
4 6 8 11
9 12
corresponding to this matrix representation:
1, 2, 4, 8, 16, 32...
5, 10, 20, 40, 80, 160...
25, 50, 100, 200, 400...
First of all, (as others mentioned already) this question is very vague!!!
Nevertheless, I am going to give a shot based on your vague equation and the pattern as your expected result. So I am not sure the following will be true for what you are trying to do, however it may give you some idea about java collections!
import java.util.List;
import java.util.ArrayList;
import java.util.SortedSet;
import java.util.TreeSet;
public class IncreasingNumbers {
private static List<Integer> findIncreasingNumbers(int maxIteration) {
SortedSet<Integer> numbers = new TreeSet<Integer>();
SortedSet<Integer> numbers2 = new TreeSet<Integer>();
for (int i=0;i < maxIteration;i++) {
int n1 = (int)Math.pow(2, i);
numbers.add(n1);
for (int j=0;j < maxIteration;j++) {
int n2 = (int)Math.pow(5, i);
numbers.add(n2);
for (Integer n: numbers) {
int n3 = n*n1;
numbers2.add(n3);
}
}
}
numbers.addAll(numbers2);
return new ArrayList<Integer>(numbers);
}
/**
* Based on the following fuzzy question # StackOverflow
* http://stackoverflow.com/questions/7571934/printing-numbers-of-the-form-2i-5j-in-increasing-order
*
*
* Result:
* 1 2 4 5 8 10 16 20 25 32 40 64 80 100 125 128 200 256 400 625 1000 2000 10000
*/
public static void main(String[] args) {
List<Integer> numbers = findIncreasingNumbers(5);
for (Integer i: numbers) {
System.out.print(i + " ");
}
}
}
If you can do it in O(nlogn), here's a simple solution:
Get an empty min-heap
Put 1 in the heap
while (you want to continue)
Get num from heap
print num
put num*2 and num*5 in the heap
There you have it. By min-heap, I mean min-heap
As a mathematician the first thing I always think about when looking at something like this is "will logarithms help?".
In this case it might.
If our series A is increasing then the series log(A) is also increasing. Since all terms of A are of the form 2^i.5^j then all members of the series log(A) are of the form i.log(2) + j.log(5)
We can then look at the series log(A)/log(2) which is also increasing and its elements are of the form i+j.(log(5)/log(2))
If we work out the i and j that generates the full ordered list for this last series (call it B) then that i and j will also generate the series A correctly.
This is just changing the nature of the problem but hopefully to one where it becomes easier to solve. At each step you can either increase i and decrease j or vice versa.
Looking at a few of the early changes you can make (which I will possibly refer to as transforms of i,j or just transorms) gives us some clues of where we are going.
Clearly increasing i by 1 will increase B by 1. However, given that log(5)/log(2) is approx 2.3 then increasing j by 1 while decreasing i by 2 will given an increase of just 0.3 . The problem then is at each stage finding the minimum possible increase in B for changes of i and j.
To do this I just kept a record as I increased of the most efficient transforms of i and j (ie what to add and subtract from each) to get the smallest possible increase in the series. Then applied whichever one was valid (ie making sure i and j don't go negative).
Since at each stage you can either decrease i or decrease j there are effectively two classes of transforms that can be checked individually. A new transform doesn't have to have the best overall score to be included in our future checks, just better than any other in its class.
To test my thougths I wrote a sort of program in LinqPad. Key things to note are that the Dump() method just outputs the object to screen and that the syntax/structure isn't valid for a real c# file. Converting it if you want to run it should be easy though.
Hopefully anything not explicitly explained will be understandable from the code.
void Main()
{
double C = Math.Log(5)/Math.Log(2);
int i = 0;
int j = 0;
int maxi = i;
int maxj = j;
List<int> outputList = new List<int>();
List<Transform> transforms = new List<Transform>();
outputList.Add(1);
while (outputList.Count<500)
{
Transform tr;
if (i==maxi)
{
//We haven't considered i this big before. Lets see if we can find an efficient transform by getting this many i and taking away some j.
maxi++;
tr = new Transform(maxi, (int)(-(maxi-maxi%C)/C), maxi%C);
AddIfWorthwhile(transforms, tr);
}
if (j==maxj)
{
//We haven't considered j this big before. Lets see if we can find an efficient transform by getting this many j and taking away some i.
maxj++;
tr = new Transform((int)(-(maxj*C)), maxj, (maxj*C)%1);
AddIfWorthwhile(transforms, tr);
}
//We have a set of transforms. We first find ones that are valid then order them by score and take the first (smallest) one.
Transform bestTransform = transforms.Where(x=>x.I>=-i && x.J >=-j).OrderBy(x=>x.Score).First();
//Apply transform
i+=bestTransform.I;
j+=bestTransform.J;
//output the next number in out list.
int value = GetValue(i,j);
//This line just gets it to stop when it overflows. I would have expected an exception but maybe LinqPad does magic with them?
if (value<0) break;
outputList.Add(value);
}
outputList.Dump();
}
public int GetValue(int i, int j)
{
return (int)(Math.Pow(2,i)*Math.Pow(5,j));
}
public void AddIfWorthwhile(List<Transform> list, Transform tr)
{
if (list.Where(x=>(x.Score<tr.Score && x.IncreaseI == tr.IncreaseI)).Count()==0)
{
list.Add(tr);
}
}
// Define other methods and classes here
public class Transform
{
public int I;
public int J;
public double Score;
public bool IncreaseI
{
get {return I>0;}
}
public Transform(int i, int j, double score)
{
I=i;
J=j;
Score=score;
}
}
I've not bothered looking at the efficiency of this but I strongly suspect its better than some other solutions because at each stage all I need to do is check my set of transforms - working out how many of these there are compared to "n" is non-trivial. It is clearly related since the further you go the more transforms there are but the number of new transforms becomes vanishingly small at higher numbers so maybe its just O(1). This O stuff always confused me though. ;-)
One advantage over other solutions is that it allows you to calculate i,j without needing to calculate the product allowing me to work out what the sequence would be without needing to calculate the actual number itself.
For what its worth after the first 230 nunmbers (when int runs out of space) I had 9 transforms to check each time. And given its only my total that overflowed I ran if for the first million results and got to i=5191 and j=354. The number of transforms was 23. The size of this number in the list is approximately 10^1810. Runtime to get to this level was approx 5 seconds.
P.S. If you like this answer please feel free to tell your friends since I spent ages on this and a few +1s would be nice compensation. Or in fact just comment to tell me what you think. :)
I'm sure everyone one's might have got the answer by now, but just wanted to give a direction to this solution..
It's a Ctrl C + Ctrl V from
http://www.careercup.com/question?id=16378662
void print(int N)
{
int arr[N];
arr[0] = 1;
int i = 0, j = 0, k = 1;
int numJ, numI;
int num;
for(int count = 1; count < N; )
{
numI = arr[i] * 2;
numJ = arr[j] * 5;
if(numI < numJ)
{
num = numI;
i++;
}
else
{
num = numJ;
j++;
}
if(num > arr[k-1])
{
arr[k] = num;
k++;
count++;
}
}
for(int counter = 0; counter < N; counter++)
{
printf("%d ", arr[counter]);
}
}
The question as put to me was to return an infinite set of solutions. I pondered the use of trees, but felt there was a problem with figuring out when to harvest and prune the tree, given an infinite number of values for i & j. I realized that a sieve algorithm could be used. Starting from zero, determine whether each positive integer had values for i and j. This was facilitated by turning answer = (2^i)*(2^j) around and solving for i instead. That gave me i = log2 (answer/ (5^j)). Here is the code:
class Program
{
static void Main(string[] args)
{
var startTime = DateTime.Now;
int potential = 0;
do
{
if (ExistsIandJ(potential))
Console.WriteLine("{0}", potential);
potential++;
} while (potential < 100000);
Console.WriteLine("Took {0} seconds", DateTime.Now.Subtract(startTime).TotalSeconds);
}
private static bool ExistsIandJ(int potential)
{
// potential = (2^i)*(5^j)
// 1 = (2^i)*(5^j)/potential
// 1/(2^1) = (5^j)/potential or (2^i) = potential / (5^j)
// i = log2 (potential / (5^j))
for (var j = 0; Math.Pow(5,j) <= potential; j++)
{
var i = Math.Log(potential / Math.Pow(5, j), 2);
if (i == Math.Truncate(i))
return true;
}
return false;
}
}

Finding the number of digits of an integer

What is the best method to find the number of digits of a positive integer?
I have found this 3 basic methods:
conversion to string
String s = new Integer(t).toString();
int len = s.length();
for loop
for(long long int temp = number; temp >= 1;)
{
temp/=10;
decimalPlaces++;
}
logaritmic calculation
digits = floor( log10( number ) ) + 1;
where you can calculate log10(x) = ln(x) / ln(10) in most languages.
First I thought the string method is the dirtiest one but the more I think about it the more I think it's the fastest way. Or is it?
There's always this method:
n = 1;
if ( i >= 100000000 ) { n += 8; i /= 100000000; }
if ( i >= 10000 ) { n += 4; i /= 10000; }
if ( i >= 100 ) { n += 2; i /= 100; }
if ( i >= 10 ) { n += 1; }
Well the correct answer would be to measure it - but you should be able to make a guess about the number of CPU steps involved in converting strings and going through them looking for an end marker
Then think how many FPU operations/s your processor can do and how easy it is to calculate a single log.
edit: wasting some more time on a monday morning :-)
String s = new Integer(t).toString();
int len = s.length();
One of the problems with high level languages is guessing how much work the system is doing behind the scenes of an apparently simple statement. Mandatory Joel link
This statement involves allocating memory for a string, and possibly a couple of temporary copies of a string. It must parse the integer and copy the digits of it into a string, possibly having to reallocate and move the existing memory if the number is large. It might have to check a bunch of locale settings to decide if your country uses "," or ".", it might have to do a bunch of unicode conversions.
Then finding the length has to scan the entire string, again considering unicode and any local specific settings such as - are you in a right->left language?.
Alternatively:
digits = floor( log10( number ) ) + 1;
Just because this would be harder for you to do on paper doesn't mean it's hard for a computer! In fact a good rule in high performance computing seems to have been - if something is hard for a human (fluid dynamics, 3d rendering) it's easy for a computer, and if it's easy for a human (face recognition, detecting a voice in a noisy room) it's hard for a computer!
You can generally assume that the builtin maths functions log/sin/cos etc - have been an important part of computer design for 50years. So even if they don't map directly into a hardware function in the FPU you can bet that the alternative implementation is pretty efficient.
I don't know, and the answer may well be different depending on how your individual language is implemented.
So, stress test it! Implement all three solutions. Run them on 1 through 1,000,000 (or some other huge set of numbers that's representative of the numbers the solution will be running against) and time how long each of them takes.
Pit your solutions against one another and let them fight it out. Like intellectual gladiators. Three algorithms enter! One algorithm leaves!
Test conditions
Decimal numeral system
Positive integers
Up to 10 digits
Language: ActionScript 3
Results
digits: [1,10],
no. of runs: 1,000,000
random sample: 8777509,40442298,477894,329950,513,91751410,313,3159,131309,2
result: 7,8,6,6,3,8,3,4,6,1
CONVERSION TO STRING: 724ms
LOGARITMIC CALCULATION: 349ms
DIV 10 ITERATION: 229ms
MANUAL CONDITIONING: 136ms
Note: Author refrains from making any conclusions for numbers with more than 10 digits.
Script
package {
import flash.display.MovieClip;
import flash.utils.getTimer;
/**
* #author Daniel
*/
public class Digits extends MovieClip {
private const NUMBERS : uint = 1000000;
private const DIGITS : uint = 10;
private var numbers : Array;
private var digits : Array;
public function Digits() {
// ************* NUMBERS *************
numbers = [];
for (var i : int = 0; i < NUMBERS; i++) {
var number : Number = Math.floor(Math.pow(10, Math.random()*DIGITS));
numbers.push(number);
}
trace('Max digits: ' + DIGITS + ', count of numbers: ' + NUMBERS);
trace('sample: ' + numbers.slice(0, 10));
// ************* CONVERSION TO STRING *************
digits = [];
var time : Number = getTimer();
for (var i : int = 0; i < numbers.length; i++) {
digits.push(String(numbers[i]).length);
}
trace('\nCONVERSION TO STRING - time: ' + (getTimer() - time));
trace('sample: ' + digits.slice(0, 10));
// ************* LOGARITMIC CALCULATION *************
digits = [];
time = getTimer();
for (var i : int = 0; i < numbers.length; i++) {
digits.push(Math.floor( Math.log( numbers[i] ) / Math.log(10) ) + 1);
}
trace('\nLOGARITMIC CALCULATION - time: ' + (getTimer() - time));
trace('sample: ' + digits.slice(0, 10));
// ************* DIV 10 ITERATION *************
digits = [];
time = getTimer();
var digit : uint = 0;
for (var i : int = 0; i < numbers.length; i++) {
digit = 0;
for(var temp : Number = numbers[i]; temp >= 1;)
{
temp/=10;
digit++;
}
digits.push(digit);
}
trace('\nDIV 10 ITERATION - time: ' + (getTimer() - time));
trace('sample: ' + digits.slice(0, 10));
// ************* MANUAL CONDITIONING *************
digits = [];
time = getTimer();
var digit : uint;
for (var i : int = 0; i < numbers.length; i++) {
var number : Number = numbers[i];
if (number < 10) digit = 1;
else if (number < 100) digit = 2;
else if (number < 1000) digit = 3;
else if (number < 10000) digit = 4;
else if (number < 100000) digit = 5;
else if (number < 1000000) digit = 6;
else if (number < 10000000) digit = 7;
else if (number < 100000000) digit = 8;
else if (number < 1000000000) digit = 9;
else if (number < 10000000000) digit = 10;
digits.push(digit);
}
trace('\nMANUAL CONDITIONING: ' + (getTimer() - time));
trace('sample: ' + digits.slice(0, 10));
}
}
}
This algorithm might be good also, assuming that:
Number is integer and binary encoded (<< operation is cheap)
We don't known number boundaries
var num = 123456789L;
var len = 0;
var tmp = 1L;
while(tmp < num)
{
len++;
tmp = (tmp << 3) + (tmp << 1);
}
This algorithm, should have speed comparable to for-loop (2) provided, but a bit faster due to (2 bit-shifts, add and subtract, instead of division).
As for Log10 algorithm, it will give you only approximate answer (that is close to real, but still), since analytic formula for computing Log function have infinite loop and can't be calculated precisely Wiki.
Use the simplest solution in whatever programming language you're using. I can't think of a case where counting digits in an integer would be the bottleneck in any (useful) program.
C, C++:
char buffer[32];
int length = sprintf(buffer, "%ld", (long)123456789);
Haskell:
len = (length . show) 123456789
JavaScript:
length = String(123456789).length;
PHP:
$length = strlen(123456789);
Visual Basic (untested):
length = Len(str(123456789)) - 1
conversion to string: This will have to iterate through each digit, find the character that maps to the current digit, add a character to a collection of characters. Then get the length of the resulting String object. Will run in O(n) for n=#digits.
for-loop: will perform 2 mathematical operation: dividing the number by 10 and incrementing a counter. Will run in O(n) for n=#digits.
logarithmic: Will call log10 and floor, and add 1. Looks like O(1) but I'm not really sure how fast the log10 or floor functions are. My knowledge of this sort of things has atrophied with lack of use so there could be hidden complexity in these functions.
So I guess it comes down to: is looking up digit mappings faster than multiple mathematical operations or whatever is happening in log10? The answer will probably vary. There could be platforms where the character mapping is faster, and others where doing the calculations is faster. Also to keep in mind is that the first method will creats a new String object that only exists for the purpose of getting the length. This will probably use more memory than the other two methods, but it may or may not matter.
You can obviously eliminate the method 1 from the competition, because the atoi/toString algorithm it uses would be similar to method 2.
Method 3's speed depends on whether the code is being compiled for a system whose instruction set includes log base 10.
For very large integers, the log method is much faster. For instance, with a 2491327 digit number (the 11920928th Fibonacci number, if you care), Python takes several minutes to execute the divide-by-10 algorithm, and milliseconds to execute 1+floor(log(n,10)).
import math
def numdigits(n):
return ( int(math.floor(math.log10(n))) + 1 )
Regarding the three methods you propose for "determining the number of digits necessary to represent a given number in a given base", I don't like any of them, actually; I prefer the method I give below instead.
Re your method #1 (strings): Anything involving converting back-and-forth between strings and numbers is usually very slow.
Re your method #2 (temp/=10): This is fatally flawed because it assumes that x/10 always means "x divided by 10". But in many programming languages (eg: C, C++), if "x" is an integer type, then "x/10" means "integer division", which isn't the same thing as floating-point division, and it introduces round-off errors at every iteration, and they accumulate in a recursive formula such as your solution #2 uses.
Re your method #3 (logs): it's buggy for large numbers (at least in C, and probably other languages as well), because floating-point data types tend not to be as precise as 64-bit integers.
Hence I dislike all 3 of those methods: #1 works but is slow, #2 is broken, and #3 is buggy for large numbers. Instead, I prefer this, which works for numbers from 0 up to about 18.44 quintillion:
unsigned NumberOfDigits (uint64_t Number, unsigned Base)
{
unsigned Digits = 1;
uint64_t Power = 1;
while ( Number / Power >= Base )
{
++Digits;
Power *= Base;
}
return Digits;
}
Keep it simple:
long long int a = 223452355415634664;
int x;
for (x = 1; a >= 10; x++)
{
a = a / 10;
}
printf("%d", x);
You can use a recursive solution instead of a loop, but somehow similar:
#tailrec
def digits (i: Long, carry: Int=1) : Int = if (i < 10) carry else digits (i/10, carry+1)
digits (8345012978643L)
With longs, the picture might change - measure small and long numbers independently against different algorithms, and pick the appropriate one, depending on your typical input. :)
Of course nothing beats a switch:
switch (x) {
case 0: case 1: case 2: case 3: case 4: case 5: case 6: case 7: case 8: case 9: return 1;
case 10: case 11: // ...
case 99: return 2;
case 100: // you get the point :)
default: return 10; // switch only over int
}
except a plain-o-array:
int [] size = {1,1,1,1,1,1,1,1,1,2,2,2,2,2,... };
int x = 234561798;
return size [x];
Some people will tell you to optimize the code-size, but yaknow, premature optimization ...
log(x,n)-mod(log(x,n),1)+1
Where x is a the base and n is the number.
Here is the measurement in Swift 4.
Algorithms code:
extension Int {
var numberOfDigits0: Int {
var currentNumber = self
var n = 1
if (currentNumber >= 100000000) {
n += 8
currentNumber /= 100000000
}
if (currentNumber >= 10000) {
n += 4
currentNumber /= 10000
}
if (currentNumber >= 100) {
n += 2
currentNumber /= 100
}
if (currentNumber >= 10) {
n += 1
}
return n
}
var numberOfDigits1: Int {
return String(self).count
}
var numberOfDigits2: Int {
var n = 1
var currentNumber = self
while currentNumber > 9 {
n += 1
currentNumber /= 10
}
return n
}
}
Measurement code:
var timeInterval0 = Date()
for i in 0...10000 {
i.numberOfDigits0
}
print("timeInterval0: \(Date().timeIntervalSince(timeInterval0))")
var timeInterval1 = Date()
for i in 0...10000 {
i.numberOfDigits1
}
print("timeInterval1: \(Date().timeIntervalSince(timeInterval1))")
var timeInterval2 = Date()
for i in 0...10000 {
i.numberOfDigits2
}
print("timeInterval2: \(Date().timeIntervalSince(timeInterval2))")
Output
timeInterval0: 1.92149806022644
timeInterval1: 0.557608008384705
timeInterval2: 2.83262193202972
On this measurement basis String conversion is the best option for the Swift language.
I was curious after seeing #daniel.sedlacek results so I did some testing using Swift for numbers having more than 10 digits. I ran the following script in the playground.
let base = [Double(100090000000), Double(100050000), Double(100050000), Double(100000200)]
var rar = [Double]()
for i in 1...10 {
for d in base {
let v = d*Double(arc4random_uniform(UInt32(1000000000)))
rar.append(v*Double(arc4random_uniform(UInt32(1000000000))))
rar.append(Double(1)*pow(1,Double(i)))
}
}
print(rar)
var timeInterval = NSDate().timeIntervalSince1970
for d in rar {
floor(log10(d))
}
var newTimeInterval = NSDate().timeIntervalSince1970
print(newTimeInterval-timeInterval)
timeInterval = NSDate().timeIntervalSince1970
for d in rar {
var c = d
while c > 10 {
c = c/10
}
}
newTimeInterval = NSDate().timeIntervalSince1970
print(newTimeInterval-timeInterval)
Results of 80 elements
0.105069875717163 for floor(log10(x))
0.867973804473877 for div 10 iterations
Adding one more approach to many of the already mentioned approaches.
The idea is to use binarySearch on an array containing the range of integers based on the digits of the int data type.
The signature of Java Arrays class binarySearch is :
binarySearch(dataType[] array, dataType key) which returns the index of the search key, if it is contained in the array; otherwise, (-(insertion point) – 1).
The insertion point is defined as the point at which the key would be inserted into the array.
Below is the implementation:
static int [] digits = {9,99,999,9999,99999,999999,9999999,99999999,999999999,Integer.MAX_VALUE};
static int digitsCounter(int N)
{
int digitCount = Arrays.binarySearch(digits , N<0 ? -N:N);
return 1 + (digitCount < 0 ? ~digitCount : digitCount);
}
Please note that the above approach only works for : Integer.MIN_VALUE <= N <= Integer.MAX_VALUE, but can be easily extended for Long data type by adding more values to the digits array.
For example,
I) for N = 555, digitCount = Arrays.binarySearch(digits , 555) returns -3 (-(2)-1) as it's not present in the array but is supposed to be inserted at point 2 between 9 & 99 like [9, 55, 99].
As the index we got is negative we need to take the bitwise compliment of the result.
At last, we need to add 1 to the result to get the actual number of digits in the number N.
In Swift 5.x, you get the number of digit in integer as below :
Convert to string and then count number of character in string
let nums = [1, 7892, 78, 92, 90]
for i in nums {
let ch = String(describing: i)
print(ch.count)
}
Calculating the number of digits in integer using loop
var digitCount = 0
for i in nums {
var tmp = i
while tmp >= 1 {
tmp /= 10
digitCount += 1
}
print(digitCount)
}
let numDigits num =
let num = abs(num)
let rec numDigitsInner num =
match num with
| num when num < 10 -> 1
| _ -> 1 + numDigitsInner (num / 10)
numDigitsInner num
F# Version, without casting to a string.

How to use Factoradic system to get or unrank the K-th permutations WITH repeated items

Yesterday I spent the entire day trying to solve a problem that wants me to get the k-th permutation or unrank a permutation.
I found the best way was factoradic numbers, after hours of Googling and reading dozens of pdfs\powerpoints I finally managed to make it work perfectly both with pencil and paper and by code.
Problem now is, when there are repeated items.
I tried everything, but couldn't get the thing to work the way it should.The factoradic always generates much bigger rank for a permutation, can't just let it "recognize" only non-repeated permutations.
Does anyone know a way to use the actoradic system to unrank a permutation with repeated items ? (eg: abaac) ?
If anyone knows, please I would love a small example and intuitive explanation, that sure will benifit many others in the future.
Thanks a lot :)
PS: Here is my attempted C++ code that I wrote MYSELF.I know its not optmized at all, but just to show you what I got so far:
This code will work correct if no repeated items, but will be wrong with repeated items (next_permutation is not usable of course when say, I want the 1 billionth permutation).
#include <iostream>
#include <cstdio>
#include <string>
#include <algorithm>
using namespace std;
int f(int n) {
if(n<2) return 1;
return n*f(n-1);
}
int pos(string& s,char& c) {
for(int i=0;i<s.size();++i) {
if(s[i]==c) return i;
}
return -1;
}
int main() {
const char* perm = "bedac";
string original=perm;
sort(original.begin(),original.end());
string s=original;
string t=perm;
int res=0;
for(;s!=t && next_permutation(s.begin(),s.end());++res);
cout<<"real:"<<res<<endl;
s=original;
string n;
while(!s.empty()) {
int p=pos(s,t[0]);
n+=p;
t.erase(0,1);
s.erase(p,1);
}
for(res=0;!n.empty();(res+=n[0]*f(n.size()-1)),n.erase(0,1));
cout<<"factoradix:"<<res<<endl;
return 0;
}
In a permutation where all elements are unique, we can generate each element, in a recursive fashion. To rewrite your implementation a bit (in pseudo-code)
def map(k,left):
ele = k/(len(left)!)
return [ele] + map( k % (len(left)!), left - left[ele])
Here we know a priori how many elements are in the subcollection, namely (k-1)!.
In a permutation with repeated elements, the number of remaining elements is (k-1)!/((# of 1s)!(# of 2s)! ... (# of ks)!) and this changes based on what element we choose on each level. We need to apply the same idea, but instead of being able to calculate the index on the fly, we need to determine how many sub-permutations there are if we choose element X at each level of the recursion. We subtract that from the permutation number and recurse.
# group_v is the value of an element
# group_members is the number of times it is repeated
# facts_with is group_members[x] factorial
def permap(k,group_v,group_members,facts_with):
n = sum(group_members); # how many elements left
if n == 0:
return []
total = math.factorial(n-1);
total_denom = prod(facts_with);
start_range = 0; end_range = 0;
for group_i in range(len(group_v)):
if group_members[group_i] == 0:
continue
v = (group_members[group_i]*total)/(total_denom) # n-1!/((a-1)!...z!)
end_range += v
if end_range > k:
facts_with[group_i]/=group_members[group_i];
group_members[group_i]-=1;
return [group_v[group_i]] + permap(k-start_range,group_v,group_members,facts_with)
else:
start_range=end_range
raise Exception()
The full listing in Python
#imports
import itertools;
import math;
import operator
def prod(lst):
return reduce(operator.mul,lst);
#mainfunc
def permap(k,group_v,group_members,facts_with):
n = sum(group_members);
if n == 0:
return []
total = math.factorial(n-1);
total_denom = prod(facts_with);
start_range = 0; end_range = 0;
for group_i in range(len(group_v)):
if group_members[group_i] == 0:
continue
v = (group_members[group_i]*total)/(total_denom) # n-1!/(a!...z!)
end_range += v
if end_range > k:
facts_with[group_i]/=group_members[group_i];
group_members[group_i]-=1;
return [group_v[group_i]] + permap(k-start_range,group_v,group_members,facts_with)
else:
start_range=end_range
raise Exception()
items = [1,2,2,1]
n_groups = len(list(itertools.groupby(items)))
facts_with = [0]*(n_groups)
group_v = [0]*(n_groups)
group_members = [0]*(n_groups)
group_i = 0
print [list(g) for k,g in itertools.groupby(items)];
for group in itertools.groupby(items):
group_v[group_i], group_members[group_i] = group;
group_members[group_i] = len(list(group_members[group_i]))
facts_with[group_i] = math.factorial(group_members[group_i]);
group_i+=1
for x in range(6):
print permap(x,list(group_v),list(group_members),list(facts_with));

A problem from a programming competition... Digit Sums

I need help solving problem N from this earlier competition:
Problem N: Digit Sums
Given 3 positive integers A, B and C,
find how many positive integers less
than or equal to A, when expressed in
base B, have digits which sum to C.
Input will consist of a series of
lines, each containing three integers,
A, B and C, 2 ≤ B ≤ 100, 1 ≤ A, C ≤
1,000,000,000. The numbers A, B and C
are given in base 10 and are separated
by one or more blanks. The input is
terminated by a line containing three
zeros.
Output will be the number of numbers,
for each input line (it must be given
in base 10).
Sample input
100 10 9
100 10 1
750000 2 2
1000000000 10 40
100000000 100 200
0 0 0
Sample output
10
3
189
45433800
666303
The relevant rules:
Read all input from the keyboard, i.e. use stdin, System.in, cin or equivalent. Input will be redirected from a file to form the input to your submission.
Write all output to the screen, i.e. use stdout, System.out, cout or equivalent. Do not write to stderr. Do NOT use, or even include, any module that allows direct manipulation of the screen, such as conio, Crt or anything similar. Output from your program is redirected to a file for later checking. Use of direct I/O means that such output is not redirected and hence cannot be checked. This could mean that a correct program is rejected!
Unless otherwise stated, all integers in the input will fit into a standard 32-bit computer word. Adjacent integers on a line will be separated by one or more spaces.
Of course, it's fair to say that I should learn more before trying to solve this, but i'd really appreciate it if someone here told me how it's done.
Thanks in advance, John.
Other people pointed out trivial solution: iterate over all numbers from 1 to A. But this problem, actually, can be solved in nearly constant time: O(length of A), which is O(log(A)).
Code provided is for base 10. Adapting it for arbitrary base is trivial.
To reach above estimate for time, you need to add memorization to recursion. Let me know if you have questions about that part.
Now, recursive function itself. Written in Java, but everything should work in C#/C++ without any changes. It's big, but mostly because of comments where I try to clarify algorithm.
// returns amount of numbers strictly less than 'num' with sum of digits 'sum'
// pay attention to word 'strictly'
int count(int num, int sum) {
// no numbers with negative sum of digits
if (sum < 0) {
return 0;
}
int result = 0;
// imagine, 'num' == 1234
// let's check numbers 1233, 1232, 1231, 1230 manually
while (num % 10 > 0) {
--num;
// check if current number is good
if (sumOfDigits(num) == sum) {
// one more result
++result;
}
}
if (num == 0) {
// zero reached, no more numbers to check
return result;
}
num /= 10;
// Using example above (1234), now we're left with numbers
// strictly less than 1230 to check (1..1229)
// It means, any number less than 123 with arbitrary digit appended to the right
// E.g., if this digit in the right (last digit) is 3,
// then sum of the other digits must be "sum - 3"
// and we need to add to result 'count(123, sum - 3)'
// let's iterate over all possible values of last digit
for (int digit = 0; digit < 10; ++digit) {
result += count(num, sum - digit);
}
return result;
}
Helper function
// returns sum of digits, plain and simple
int sumOfDigits(int x) {
int result = 0;
while (x > 0) {
result += x % 10;
x /= 10;
}
return result;
}
Now, let's write a little tester
int A = 12345;
int C = 13;
// recursive solution
System.out.println(count(A + 1, C));
// brute-force solution
int total = 0;
for (int i = 1; i <= A; ++i) {
if (sumOfDigits(i) == C) {
++total;
}
}
System.out.println(total);
You can write more comprehensive tester checking all values of A, but overall solution seems to be correct. (I tried several random A's and C's.)
Don't forget, you can't test solution for A == 1000000000 without memorization: it'll run too long. But with memorization, you can test it even for A == 10^1000.
edit
Just to prove a concept, poor man's memorization. (in Java, in other languages hashtables are declared differently) But if you want to learn something, it might be better to try to do it yourself.
// hold values here
private Map<String, Integer> mem;
int count(int num, int sum) {
// no numbers with negative sum of digits
if (sum < 0) {
return 0;
}
String key = num + " " + sum;
if (mem.containsKey(key)) {
return mem.get(key);
}
// ...
// continue as above...
// ...
mem.put(key, result);
return result;
}
Here's the same memoized recursive solution that Rybak posted, but with a simpler implementation, in my humble opinion:
HashMap<String, Integer> cache = new HashMap<String, Integer>();
int count(int bound, int base, int sum) {
// No negative digit sums.
if (sum < 0)
return 0;
// Handle one digit case.
if (bound < base)
return (sum <= bound) ? 1 : 0;
String key = bound + " " + sum;
if (cache.containsKey(key))
return cache.get(key);
int count = 0;
for (int digit = 0; digit < base; digit++)
count += count((bound - digit) / base, base, sum - digit);
cache.put(key, count);
return count;
}
This is not the complete solution (no input parsing). To get the number in base B, repeatedly take the modulo B, and then divide by B until the result is 0. This effectively computes the base-B digit from the right, and then shifts the number right.
int A,B,C; // from input
for (int x=1; x<A; x++)
{
int sumDigits = 0;
int v = x;
while (v!=0) {
sumDigits += (v % B);
v /= B;
}
if (sumDigits==C)
cout << x;
}
This is a brute force approach. It may be possible to compute this quicker by determining which sets of base B digits add up to C, arranging these in all permutations that are less than A, and then working backwards from that to create the original number.
Yum.
Try this:
int number, digitSum, resultCounter = 0;
for(int i=1; i<=A, i++)
{
number = i; //to avoid screwing up our counter
digitSum = 0;
while(number > 1)
{
//this is the next "digit" of the number as it would be in base B;
//works with any base including 10.
digitSum += (number % B);
//remove this digit from the number, square the base, rinse, repeat
number /= B;
}
digitSum += number;
//Does the sum match?
if(digitSum == C)
resultCounter++;
}
That's your basic algorithm for one line. Now you wrap this in another For loop for each input line you received, preceded by the input collection phase itself. This process can be simplified, but I don't feel like coding your entire answer to see if my algorithm works, and this looks right whereas the simpler tricks are harder to pass by inspection.
The way this works is by modulo dividing by powers of the base. Simple example, 1234 in base 10:
1234 % 10 = 4
1234 / 10 = 123 //integer division truncates any fraction
123 % 10 = 3 //sum is 7
123 / 10 = 12
12 % 10 = 2 //sum is 9
12 / 10 = 1 //end condition, add this and the sum is 10
A harder example to figure out by inspection would be the same number in base 12:
1234 % 12 = 10 //you can call it "A" like in hex, but we need a sum anyway
1234 / 12 = 102
102 % 12 = 6 // sum 16
102/12 = 8
8 % 12 = 8 //sum 24
8 / 12 = 0 //end condition, sum still 24.
So 1234 in base 12 would be written 86A. Check the math:
8*12^2 + 6*12 + 10 = 1152 + 72 + 10 = 1234
Have fun wrapping the rest of the code around this.

Resources