Understanding error in oop Backprop implementation - backpropagation

Good evening everybody.
I have been using NN quite often already, so I thought it's time to face the background.
As a result I have been spending quite a lot of hours with my c++ implementation of a Neural Network from scratch. Still, I do not get any useful output.
My issue is the clean OOP and efficient implementation, especially what I have to backpropagate from one Layer class to the next. I am aware that I'm just skipping the full calculation/forwarding of the jacobian matrices, but from my understanding this isn't necessary, since most of the entries will be cut out.
I have a softmax class with size n:
Forward Pass: It takes an input vector input of length n and creates an output vector output of size n.
sum = 0; for (int i = 0; i < n; i++) sum += e^input[ i ].
Then it calculates an output vector output of length n with:
output [ i ] = e^input [ i ] / sum
Backward Pass: It takes a feedback vector target of size n, the target value.
I do not have weights or biase in my softmax class, so I just calculate the feedback vector feedback of size n:
feedback[ i ] = output[ i ] - target[ i ]
That is what I return from my softmax layer.
I have a fully Connected class: m -> n
Forward Pass: It take an input vector of size m.
I calculate the net activity vector net of size n, and an output vector of size n:
net[ i ] = b[ i ];
for (int j = 0; j < m; j++) net[ i ] += w[ i ][ j ] * input[ i ]
output [ i ] = 1 / (1 + e^-net[ i ])
Backward Pass: It takes an feedback vector of size n from the following layer.
b'[ i ] = b[ i ] + feedback[ i ] * 1 * learningRate
w'[ i ][ j ] = w[ i ][ j ] + feedback[ i ] * input[ j ] * learningRate
The new feedback array of size m:
feedback'[ i ] = 0;
feedback'[ i ] += feedback[ j ] * weights[ i ][ j ] * (output[ j ] * (1 - output[ j ]))
Of course, the feedback from one fully connected layer will be passed to the next, and so on.
I've been reading a few articles and found this one quite nice:
I feel like my implementation should be identical to what I read in such papers, but even after a small number of training examples (~100) my network output is getting close to a constant. Basically as if it would be just depending on the biase.
So, could someone please give me a hint if I'm wrong with my theoretical understanding, or if I just have some issues with my implementation?


Reverse Factor Algorithm

I'm a newbie. I've been learning about algorithms for two months now. I'm generally doing okay but I'm not good at understanding search algorithms nor implementing them. I'm particularly stuck on this pattern search algorithm, Reverse Factor. I've been researching it for a week now but I still don't completely understand it, let alone implement it. I don't have anyone I could ask but I don't want to skip any algorithms.
So far I've found this algorithm. But I don't understand it well. I'm also not a native speaker. Can you help me?
the purpose is "search a pattern p in string t".
Algorithm RF /* reverse factor string matching */
/* denote t[i + j.. i + m] by x;
it is the last-scanned part of the text */
i:= 0;
while i _< n - m do
j:= m;
while j > 1 and x ϵ FACT(p)
do j:=j- 1;
/* in fact, we check the equivalent condition x^R ϵ FACT(p^R) */
if x = p then
report a match at position i;
shift := RF shift[x];
i := i + shift;
Fact(p) is the set of all factors (substrings) of p.
Thank you in advance.
I will make a try:
i:= 0;
while i _< n - m do //start at character 0
j:= m; //start at character i + m (the potentially last character)
whilej > 1 and x ϵ FACT(p)
do j:=j- 1; //step back as long as t[i+j,i+m] is a substring of the pattern p
/* in fact, we check the equivalent condition x^R ϵ FACT(p^R) */
if x = p then // x=[i+0, i+m] == p
report a match at position i;
shift := RF shift[x]; // look up the number of chars to advance
i := i + shift; // advance
The construction of the array shift is quite hard. I cannot remember how this is done. However I could say what one would find at shift[x].
shift[x] = the number of save character shifts such that the next search does not miss a match.
Example: Having a string abcabcdab and a pattern bcd (| is i+m, * is i+j):
abc*|abcdab // start with i=0,j=3
ab*c|abcdab // c is a factor => continue
a*bc|abcdab // bc is a factor => continue
*abc|abcdab // abc is not a factor => shift = shift[bc] = 1
abc*a|bcdab // a is not a factor => shift = shift[] = 3
abcabc*d|ab // d is a factor => continue
abcab*cd|ab // cd is a factor => continue
abca*bcd|ab // bcd is a factor and j = 0 => report match
See here for an example for debugging in Java. It is not as simple as your pseudocode, but you may debug it for better understanding.

Google Combinatorial Optimization interview problem

I got asked this question on a interview for Google a couple of weeks ago, I didn't quite get the answer and I was wondering if anyone here could help me out.
You have an array with n elements. The elements are either 0 or 1.
You want to split the array into k contiguous subarrays. The size of each subarray can vary between ceil(n/2k) and floor(3n/2k). You can assume that k << n.
After you split the array into k subarrays. One element of each subarray will be randomly selected.
Devise an algorithm for maximizing the sum of the randomly selected elements from the k subarrays.
Basically means that we will want to split the array in such way such that the sum of all the expected values for the elements selected from each subarray is maximum.
You can assume that n is a power of 2.
Array: [0,0,1,1,0,0,1,1,0,1,1,0]
n = 12
k = 3
Size of subarrays can be: 2,3,4,5,6
Possible subarrays [0,0,1] [1,0,0,1] [1,0,1,1,0]
Expected Value of the sum of the elements randomly selected from the subarrays: 1/3 + 2/4 + 3/5 = 43/30 ~ 1.4333333
Optimal split: [0,0,1,1,0,0][1,1][0,1,1,0]
Expected value of optimal split: 1/3 + 1 + 1/2 = 11/6 ~ 1.83333333
I think we can solve this problem using dynamic programming.
Basically, we have:
f(i,j) is defined as the maximum sum of all expected values chosen from an array of size i and split into j subarrays. Therefore the solution should be f(n,k).
The recursive equation is:
f(i,j) = f(i-x,j-1) + sum(i-x+1,i)/x where (n/2k) <= x <= (3n/2k)
I don't know if this is still an open question or not, but it seems like the OP has managed to add enough clarifications that this should be straightforward to solve. At any rate, if I am understanding what you are saying this seems like a fair thing to ask in an interview environment for a software development position.
Here is the basic O(n^2 * k) solution, which should be adequate for small k (as the interviewer specified):
def best_val(arr, K):
n = len(arr)
psum = [ 0.0 ]
for x in arr:
psum.append(psum[-1] + x)
tab = [ -100000 for i in range(n) ]
for k in range(K):
for s in range(n - (k+1) * ceil(n/(2*K))):
terms = range(s + ceil(n/(2*K)), min(s + floor((3*n)/(2*K)) + 1, n+1))
tab[s] = max( [ (psum[t] - psum[s]) / (t - s) + tab[t] for t in terms ])
return tab[0]
I used the numpy ceil/floor functions but you basically get the idea. The only `tricks' in this version is that it does windowing to reduce the memory overhead to just O(n) instead of O(n * k), and that it precalculates the partial sums to make computing the expected value for a box a constant time operation (thus saving a factor of O(n) from the inner loop).
I don't know if anyone is still interested to see the solution for this problem. Just stumbled upon this question half an hour ago and thought of posting my solution(Java). The complexity for this is O(n*K^log10). The proof is a little convoluted so I would rather provide runtime numbers:
n k time(ms)
48 4 25
48 8 265
24 4 20
24 8 33
96 4 51
192 4 143
192 8 343919
The solution is the same old recursive one where given an array, choose the first partition of size ceil(n/2k) and find the best solution recursively for the rest with number of partitions = k -1, then take ceil(n/2k) + 1 and so on.
public class PartitionOptimization {
public static void main(String[] args) {
PartitionOptimization p = new PartitionOptimization();
int[] input = { 0, 0, 1, 1, 0, 0, 1, 1, 0, 1, 1, 0};
int splitNum = 3;
int lowerLim = (int) Math.ceil(input.length / (2.0 * splitNum));
int upperLim = (int) Math.floor((3.0 * input.length) / (2.0 * splitNum));
System.out.println(input.length + " " + lowerLim + " " + upperLim + " " +
Date currDate = new Date();
System.out.println(p.getMaxPartExpt(input, lowerLim, upperLim,
splitNum, 0));
System.out.println(new Date().getTime() - currDate.getTime());
public double getMaxPartExpt(int[] input, int lowerLim, int upperLim,
int splitNum, int startIndex) {
if (splitNum <= 1 && startIndex<=(input.length -lowerLim+1)){
double expt = findExpectation(input, startIndex, input.length-1);
return expt;
if (!((input.length - startIndex) / lowerLim >= splitNum))
return -1;
double maxExpt = 0;
double curMax = 0;
int bestI=0;
for (int i = startIndex + lowerLim - 1; i < Math.min(startIndex
+ upperLim, input.length); i++) {
double curExpect = findExpectation(input, startIndex, i);
double splitExpect = getMaxPartExpt(input, lowerLim, upperLim,
splitNum - 1, i + 1);
if (splitExpect>=0 && (curExpect + splitExpect > maxExpt)){
bestI = i;
curMax = curExpect;
maxExpt = curExpect + splitExpect;
return maxExpt;
public double findExpectation(int[] input, int startIndex, int endIndex) {
double expectation = 0;
for (int i = startIndex; i <= endIndex; i++) {
expectation = expectation + input[i];
expectation = (expectation / (endIndex - startIndex + 1));
return expectation;
Not sure I understand, the algorithm is to split the array in groups, right? The maximum value the sum can have is the number of ones. So split the array in "n" groups of 1 element each and the addition will be the maximum value possible. But it must be something else and I did not understand the problem, that seems too silly.
I think this can be solved with dynamic programming. At each possible split location, get the maximum sum if you split at that location and if you don't split at that point. A recursive function and a table to store history might be useful.
sum_i = max{ NumOnesNewPart/NumZerosNewPart * sum(NewPart) + sum(A_i+1, A_end),
sum(A_0,A_i+1) + sum(A_i+1, A_end)
This might lead to something...
I think its a bad interview question, but it is also an easy problem to solve.
Every integer contributes to the expected value with weight 1/s where s is the size of the set where it has been placed. Therefore, if you guess the sizes of the sets in your partition, you just need to fill the sets with ones starting from the smallest set, and then fill the remaining largest set with zeroes.
You can easily see then that if you have a partition, filled as above, where the sizes of the sets are S_1, ..., S_k and you do a transformation where you remove one item from set S_i and move it to set S_i+1, you have the following cases:
Both S_i and S_i+1 were filled with ones; then the expected value does not change
Both them were filled with zeroes; then the expected value does not change
S_i contained both 1's and 0's and S_i+1 contains only zeroes; moving 0 to S_i+1 increases the expected value because the expected value of S_i increases
S_i contained 1's and S_i+1 contains both 1's and 0's; moving 1 to S_i+1 increases the expected value because the expected value of S_i+1 increases and S_i remains intact
In all these cases, you can shift an element from S_i to S_i+1, maintaining the filling rule of filling smallest sets with 1's, so that the expected value increases. This leads to the simple algorithm:
Create a partitioning where there is a maximal number of maximum-size arrays and maximal number of minimum-size arrays
Fill the arrays starting from smallest one with 1's
Fill the remaining slots with 0's
How about a recursive function:
int BestValue(Array A, int numSplits)
// Returns the best value that would be obtained by splitting
// into numSplits partitions.
This in turn uses a helper:
// The additional argument is an array of the valid split sizes which
// is the same for each call.
int BestValueHelper(Array A, int numSplits, Array splitSizes)
int result = 0;
for splitSize in splitSizes
int splitResult = ExpectedValue(A, 0, splitSize) +
BestValueHelper(A+splitSize, numSplits-1, splitSizes);
if splitResult > result
result = splitResult;
ExpectedValue(Array A, int l, int m) computes the expected value of a split of A that goes from l to m i.e. (A[l] + A[l+1] + ... A[m]) / (m-l+1).
BestValue calls BestValueHelper after computing the array of valid split sizes between ceil(n/2k) and floor(3n/2k).
I have omitted error handling and some end conditions but those should not be too difficult to add.
a[] = given array of length n
from = inclusive index of array a
k = number of required splits
minSize = minimum size of a split
maxSize = maximum size of a split
d = maxSize - minSize
expectation(a, from, to) = average of all element of array a from "from" to "to"
Optimal(a[], from, k) = MAX[ for(j>=minSize-1 to <=maxSize-1) { expectation(a, from, from+j) + Optimal(a, j+1, k-1)} ]
Runtime (assuming memoization or dp) = O(n*k*d)

Efficient random sampling of constrained n-dimensional space

I'm about to optimize a problem that is defined by n (n>=1, typically n=4) non-negative variables. This is not a n-dimensional problem since the sum of all the variables needs to be 1.
The most straightforward approach would be for each x_i to scan the entire range 0<=x_i<1, and then normalizing all the values to the sum of all the x's. However, this approach introduces redundancy, which is a problem for many optimization algorithms that rely on stochastic sampling of the solution space (genetic algorithm, taboo search and others). Is there any alternative algorithm that can perform this task?
What do I mean by redundancy?
Take two dimensional case as an example. Without the constrains, this would be a two-dimensional problem which would require optimizing two variables. However, due to the requirement that X1 + X2 == 0, one only needs to optimize one variable, since X2 is determined by X1 and vice versa. Had one decided to scan X1 and X2 independently and normalizing them to the sum of 1, then many solution candidates would have been identical vis-a-vis the problem. For example (X1==0.1, X2==0.1) is identical to (X1==0.5, X2==0.5).
If you are dealing with real valued variables then arriving with 2 samples that become identical is quite unlikely. However you do have the problem that your samples would not be uniform. You are much more likely to choose (0.5, 0.5) than (1.0, 0). Oneway of fixing this is subsampling. Basically what you do is that when you are shrinking space along a certain point, you shrink the probability of choosing it.
So basically what you are doing is mapping all the points that are inside the unit cube that satisfy that are in the same direction, map to a single points. These points in the same direction form a line. The longer the line, the larger the probability that you will choose the projected point. Hence you want to bias the probability of choosing a point by the inverse of the length of that line.
Here is the code that can do it(Assuming you are looking for x_is to sum up to 1):
while(true) {
maximum = 0;
norm = 0;
sum = 0;
for (i = 0; i < N; i++) {
x[i] = random(0,1);
maximum = max(x[i], max);
sum += x[i];
norm += x[i] * x[i];
norm = sqrt(norm);
length_of_line = norm/maximum;
sample_probability = 1/length_of_line;
if (sum == 0 || random(0,1) > sample_probability) {
} else {
for (i = 0; i < N; i++) {
x[i] = x[i] /sum;
return x;
Here is the same function provided earlier by Amit Prakash, translated to python
import numpy as np
def f(N):
count += 1
x = np.random.rand(N)
mxm = np.max(x)
theSum = np.sum(x)
nrm = np.sqrt(np.sum(x * x))
length_of_line = nrm / mxm
sample_probability = 1 / length_of_line
if theSum == 0 or rand() > sample_probability:
x = x / theSum
return x

How to sort three variables using at most two swaps?

The following algorithm can sort three variables x, y and z of type K which are comparable using operator<:
void sort2(K& x, K& y) {
if(y < x)
swap(x, y);
void sort3(K& x, K& y, K& z) {
sort2(x, y);
sort2(y, z);
sort2(x, y);
This needs three swaps in the "worst case". However basic mathematics tells us, that the ordering of three values can be done using only two swaps.
Example: The values (c,b,a) will be sorted using three swaps: (c,b,a) -> (b,c,a) -> (b,a,c) -> (a,b,c). However one swap would have been enough: (c,b,a) -> (a,b,c).
What would be the simplest algorithms which sorts three variables with at most two swaps in all cases?
Find the smallest, this takes 2 comparisons, and swap it into the first position.
Then compare the remaining 2 and swap if necessary.
if (x < y) {
if (z < x) swap(x,z);
} else {
if (y < z) swap(x,y);
else swap(x,z);
if(z<y) swap(y,z);
This takes 3 comparisons, but only two swaps.
void sort(int& a, int& b, int& c)
swap(a, min(a, min(b, c)));
swap(b, min(b, c));
2 swaps, 3 comparisons.
2 to 3 comparisons, 0 to ~1.7 swaps
Old question, new answer... The following algorithm sorts x, y and z with 2 to 3 comparisons depending on their values and 0 to ~1.7 swap operations.
void sort3(K& x, K& y, K& z)
if (y < x) {
if (z < x) {
if (z < y) {
swap(x, z);
} else {
K tmp = std::move(x);
x = std::move(y);
y = std::move(z);
z = std::move(tmp);
} else {
swap(x, y);
} else {
if (z < y) {
if (z < x) {
K tmp = std::move(z);
z = std::move(y);
y = std::move(x);
x = std::move(tmp);
} else {
swap(y, z);
So, how does it work? It's basiccaly an unrolled insertion sort: if the values are already sorted (it takes 2 comparisons to check that) then the algorithm does not swap anything. Otherwise, it performs 1 or 2 swap operations. However, when 2 swap operations are required, the algorithm « rotates » the values instead so that 4 moves are performed instead of 6 (a swap operation should cost 3 moves, unless optimized).
There are only 6 possible permutations of 3 values. This algorithm does the comparisons needed to know which permutation we're treating. Then it does the swapping and leaves. Therefore, the algorithm has 6 possible paths (including the one where it does nothing because the array is already sorted). While it's still human-readable, an equivalently optimal algorithm to sort 4 values would have 24 different paths and would be much harder to read (for n values, there are n! possible permutations).
Since we're already in 2015 and you seemed to be using C++, I took the liberty use std::move so to make sure that the swap-rotate thingy would be efficient enough and would work even for moveable but non-copyable types.
Find the minimum value and swap it with the first value. Find the second minimum and swap it with the second value. Two swaps at most.
This is basically selection sort, which will perform at most n - 1 swaps.
If you don't do it in place, you can perform it without any swaps.
Encode a sorting network in a table. The Wikipedia article I linked should help you with references in case you need to figure out what to put in the table in other cases (i.e., bigger arrays).
I think what you want is to find the optimal swap in each step instead of just a valid swap. To do that, just find the greatest difference between an element and an element later in the list and swap those. In a 3-tuple, there are three possible swaps, 1-3, 1-2, and 2-3. At each step find the max difference among these three swaps and do that. Pretty sure that gives two swaps in the worst case for 3 elements. Only really makes sense if swapping is relatively expensive compared to comparing elements, otherwise probably not worth the additional analysis upfront.
Cool question :)
If assembly is available to you, and the values fit in a register, then you can probably do it extremely fast by just loading them into registers and doing a few compares, jumping to the right scenario to put the values back. Maybe your compiler makes this optimization already.
Either way, if performance is your goal, take a look at the generated machine code and optimize there. For such a small algorithm that's where you can squeeze performance out of.
I recently had to solve a similar problem - sort three values efficiently. You concentrate on swap-operations in your question. If performance is what you are looking for, concentrate on the comparison operations and branches! When sorting such a "tiny" array with just three values, a good idea is to consider using additional storage, which is appropriate for so few values. I came up with something like a specialized "merge sort" (see code below).
Just as tenfour suggests, I looked at the assembly, and the code below compiles down to a compact inline set of CPU-register operations, and is extremely fast. The additional variable "arr12" is also stored in the CPU-registers. The sorting requires two or three comparison operations. The function can easily be converted to a template (not given here for clarity).
inline void sort3_descending( double * arr )
double arr12[ 2 ];
// sort first two values
if( arr[ 0 ] > arr[ 1 ] )
arr12[ 0 ] = arr[ 0 ];
arr12[ 1 ] = arr[ 1 ];
} // if
arr12[ 0 ] = arr[ 1 ];
arr12[ 1 ] = arr[ 0 ];
} // else
// decide where to put arr12 and the third original value arr[ 3 ]
if( arr12[ 1 ] > arr[ 2 ] )
arr[ 0 ] = arr12[ 0 ];
arr[ 1 ] = arr12[ 1 ];
} // if
else if( arr[ 2 ] > arr12[ 0 ] )
arr[ 0 ] = arr [ 2 ];
arr[ 1 ] = arr12[ 0 ];
arr[ 2 ] = arr12[ 1 ];
} // if
arr[ 0 ] = arr12[ 0 ];
arr[ 1 ] = arr [ 2 ];
arr[ 2 ] = arr12[ 1 ];
} // else
This can illustrated with a truth table relating to every possible combination of comparisons to see how we can best optimize the swap you mention here.
Values | x < y | y < z | x < z
x,y,z | y | y | y
x,z,y | y | n | y
y,x,z | n | y | y
y,z,x | n | y | n
z,x,y | y | n | n
z,y,x | n | n | n
By framing the question this way, we can easily see that by initially checking and swapping the 1st and 3rd element, the lowest value that we can have in the first element after the swap can either be x or y. This simplifies the if check afterwards so that we can either swap the 1st and 2nd element when x > y or swap the 2nd and 3rd element when y > z.
if (x > z) {
if (x > y) {
} else if (y > z) {
No need for any nested if conditionals. Just 2-3 simple comparisons for 2 swaps at max.

Algorithm for sampling without replacement?

I am trying to test the likelihood that a particular clustering of data has occurred by chance. A robust way to do this is Monte Carlo simulation, in which the associations between data and groups are randomly reassigned a large number of times (e.g. 10,000), and a metric of clustering is used to compare the actual data with the simulations to determine a p value.
I've got most of this working, with pointers mapping the grouping to the data elements, so I plan to randomly reassign pointers to data. THE QUESTION: what is a fast way to sample without replacement, so that every pointer is randomly reassigned in the replicate data sets?
For example (these data are just a simplified example):
Data (n=12 values) - Group A: 0.1, 0.2, 0.4 / Group B: 0.5, 0.6, 0.8 / Group C: 0.4, 0.5 / Group D: 0.2, 0.2, 0.3, 0.5
For each replicate data set, I would have the same cluster sizes (A=3, B=3, C=2, D=4) and data values, but would reassign the values to the clusters.
To do this, I could generate random numbers in the range 1-12, assign the first element of group A, then generate random numbers in the range 1-11 and assign the second element in group A, and so on. The pointer reassignment is fast, and I will have pre-allocated all data structures, but the sampling without replacement seems like a problem that might have been solved many times before.
Logic or pseudocode preferred.
Here's some code for sampling without replacement based on Algorithm 3.4.2S of Knuth's book Seminumeric Algorithms.
void SampleWithoutReplacement
int populationSize, // size of set sampling from
int sampleSize, // size of each sample
vector<int> & samples // output, zero-offset indicies to selected items
// Use Knuth's variable names
int& n = sampleSize;
int& N = populationSize;
int t = 0; // total input records dealt with
int m = 0; // number of items selected so far
double u;
while (m < n)
u = GetUniform(); // call a uniform(0,1) random number generator
if ( (N - t)*u >= n - m )
samples[m] = t;
t++; m++;
There is a more efficient but more complex method by Jeffrey Scott Vitter in "An Efficient Algorithm for Sequential Random Sampling," ACM Transactions on Mathematical Software, 13(1), March 1987, 58-67.
A C++ working code based on the answer by John D. Cook.
#include <random>
#include <vector>
// John D. Cook, https://stackoverflow.com/a/311716/15485
void SampleWithoutReplacement
int populationSize, // size of set sampling from
int sampleSize, // size of each sample
std::vector<int> & samples // output, zero-offset indicies to selected items
// Use Knuth's variable names
int& n = sampleSize;
int& N = populationSize;
int t = 0; // total input records dealt with
int m = 0; // number of items selected so far
std::default_random_engine re;
std::uniform_real_distribution<double> dist(0,1);
while (m < n)
double u = dist(re); // call a uniform(0,1) random number generator
if ( (N - t)*u >= n - m )
samples[m] = t;
t++; m++;
#include <iostream>
int main(int,char**)
const size_t sz = 10;
std::vector< int > samples(sz);
for (size_t i = 0; i < sz; i++ ) {
std::cout << samples[i] << "\t";
return 0;
See my answer to this question Unique (non-repeating) random numbers in O(1)?. The same logic should accomplish what you are looking to do.
Inspired by #John D. Cook's answer, I wrote an implementation in Nim. At first I had difficulties understanding how it works, so I commented extensively also including an example. Maybe it helps to understand the idea. Also, I have changed the variable names slightly.
iterator uniqueRandomValuesBelow*(N, M: int) =
## Returns a total of M unique random values i with 0 <= i < N
## These indices can be used to construct e.g. a random sample without replacement
assert(M <= N)
var t = 0 # total input records dealt with
var m = 0 # number of items selected so far
while (m < M):
let u = random(1.0) # call a uniform(0,1) random number generator
# meaning of the following terms:
# (N - t) is the total number of remaining draws left (initially just N)
# (M - m) is the number how many of these remaining draw must be positive (initially just M)
# => Probability for next draw = (M-m) / (N-t)
# i.e.: (required positive draws left) / (total draw left)
# This is implemented by the inequality expression below:
# - the larger (M-m), the larger the probability of a positive draw
# - for (N-t) == (M-m), the term on the left is always smaller => we will draw 100%
# - for (N-t) >> (M-m), we must get a very small u
# example: (N-t) = 7, (M-m) = 5
# => we draw the next with prob 5/7
# lets assume the draw fails
# => t += 1 => (N-t) = 6
# => we draw the next with prob 5/6
# lets assume the draw succeeds
# => t += 1, m += 1 => (N-t) = 5, (M-m) = 4
# => we draw the next with prob 4/5
# lets assume the draw fails
# => t += 1 => (N-t) = 4
# => we draw the next with prob 4/4, i.e.,
# we will draw with certainty from now on
# (in the next steps we get prob 3/3, 2/2, ...)
if (N - t)*u >= (M - m).toFloat: # this is essentially a draw with P = (M-m) / (N-t)
# no draw -- happens mainly for (N-t) >> (M-m) and/or high u
t += 1
# draw t -- happens when (M-m) gets large and/or low u
yield t # this is where we output an index, can be used to sample
t += 1
m += 1
# example use
for i in uniqueRandomValuesBelow(100, 5):
echo i
When the population size is much greater than the sample size, the above algorithms become inefficient, since they have complexity O(n), n being the population size.
When I was a student I wrote some algorithms for uniform sampling without replacement, which have average complexity O(s log s), where s is the sample size. Here is the code for the binary tree algorithm, with average complexity O(s log s), in R:
# The Tree growing algorithm for uniform sampling without replacement
# by Pavel Ruzankin
quicksample = function (n,size)
# n - the number of items to choose from
# size - the sample size
if (s>n) {
stop("Sample size is greater than the number of items to choose from")
# upv=integer(s) #level up edge is pointing to
leftv=integer(s) #left edge is poiting to; must be filled with zeros
rightv=integer(s) #right edge is pointig to; must be filled with zeros
samp=integer(s) #the sample
ordn=integer(s) #relative ordinal number
ordn[1L]=1L #initial value for the root vertex
if (s > 1L) for (j in 2L:s) {
curn=sample(n-j+1L,1L) #current number sampled
curordn=0L #currend ordinal number
v=1L #current vertice
from=1L #how have come here: 0 - by left edge, 1 - by right edge
repeat {
if (curn+curordn>samp[v]) { #going down by the right edge
if (from == 0L) {
if (rightv[v]!=0L) {
} else { #creating a new vertex
# upv[j]=v
} else { #going down by the left edge
if (from==1L) {
if (leftv[v]!=0L) {
} else { #creating a new vertex
# upv[j]=v
The complexity of this algorithm is discussed in:
Rouzankin, P. S.; Voytishek, A. V. On the cost of algorithms for random selection. Monte Carlo Methods Appl. 5 (1999), no. 1, 39-54.
If you find the algorithm useful, please make a reference.
See also:
P. Gupta, G. P. Bhattacharjee. (1984) An efficient algorithm for random sampling without replacement. International Journal of Computer Mathematics 16:4, pages 201-209.
DOI: 10.1080/00207168408803438
Teuhola, J. and Nevalainen, O. 1982. Two efficient algorithms for random sampling without replacement. /IJCM/, 11(2): 127–140.
DOI: 10.1080/00207168208803304
In the last paper the authors use hash tables and claim that their algorithms have O(s) complexity. There is one more fast hash table algorithm, which will soon be implemented in pqR (pretty quick R):
I wrote a survey of algorithms for sampling without replacement. I may be biased but I recommend my own algorithm, implemented in C++ below, as providing the best performance for many k, n values and acceptable performance for others. randbelow(i) is assumed to return a fairly chosen random non-negative integer less than i.
void cardchoose(uint32_t n, uint32_t k, uint32_t* result) {
auto t = n - k + 1;
for (uint32_t i = 0; i < k; i++) {
uint32_t r = randbelow(t + i);
if (r < t) {
result[i] = r;
} else {
result[i] = result[r - t];
std::sort(result, result + k);
for (uint32_t i = 0; i < k; i++) {
result[i] += i;
Another algorithm for sampling without replacement is described here.
It is similar to the one described by John D. Cook in his answer and also from Knuth, but it has different hypothesis: The population size is unknown, but the sample can fit in memory. This one is called "Knuth's algorithm S".
Quoting the rosettacode article:
Select the first n items as the sample as they become available;
For the i-th item where i > n, have a random chance of n/i of keeping it. If failing this chance, the sample remains the same. If
not, have it randomly (1/n) replace one of the previously selected n
items of the sample.
Repeat #2 for any subsequent items.
