need woodcutting recursion advice - algorithm

You are given a log of wood of length 'n’. There are 'm’ markings on the log. The log must be cut at each of the marking. The cost of cutting is equal to the length of the log that is being cut. Given such a log, determine the least cost of cutting.
My partial solution is using recursion:
I am able to get the cost when i am going in sequence in the marking array i.e. from 0th cut to end of array cut. However i am stuck as to how to write code for the sequence when we are cutting not in sequence i.e. in random sequence such as the code can account for the cases when the cut is not in sequence and take a maximum for all of that cases.
One solution is to do all the permutation of the markings array. Call woodcut function for all the permutations and take maximum but that seems to be naive approach.
Any suggestions?
marking = [2, 4] (cut points)
int woodcut(length, cut_point, index){
if (cut_point > length)
return INFINITY
first_half = cut_point;
second_half = length - cut_point
if (markings[index++] == exist) {
if (next_cut_point > first)
cost = length + woodcut(second_half, next_cut_point-first)
else
cost = length + woodcut(first_half, next_cut_point)
} else if (index >= sizeof(markings))
return cost;
}
http://www.careercup.com/question?id=5188262471663616
After looking up the answers and with some help from some generous folks, I was able to code up below solution:
#include <stdio.h>
int min(int a, int b)
{
return a>b?b:a;
}
int min_cut(int first, int last, int size, int *cuts)
{
int i;
unsigned int min_cost = 1U<<30;
/* there are no cuts */
if (size == 2)
return 0;
/* there is only one cut between the end points */
if (size == 3)
return last - first;
/* cut at all the positions and take minimum of all */
for (i=1;i<size;i++) {
if (cuts[i] > first && cuts[i] < last) {
int cost = last-first + min_cut(first, cuts[i], i+1, cuts) +
min_cut(cuts[i], last, size - i, cuts);
min_cost = min(cost, min_cost);
}
}
return min_cost;
}
int main()
{
int cuts[] = {0, 2, 4, 7, 10};
int size = sizeof(cuts)/sizeof(cuts[0]);
printf("%d", min_cut(cuts[0], cuts[size-1], size, cuts));
return 0;
}

Approach A:
First write a naive recursive function that calculates the cheapest cost of cutting into pieces from the ith mark to the jth mark. Do that by taking the minimum over all possible first cuts of the cost of that first cut plus the minimum cost of cutting up the two side pieces.
Memoize this function so it is efficient.
Approach B:
Calculate a table of values for calculating the cheapest cost of cutting into pieces from the ith mark to the jth mark. Do it with an outer loop of the number of marks i and j are separate, then with an inner loop of i and then a very inner loop of possible places to do the first cut.
Both methods work. Both will be O(m*m*m) I usually would go with approach A.

Dynamic programming. Complexity O(m^3). Solution in python. Input is ordered list of marking positions, with the last item as the length of the log:
def log_cut(m):
def _log_cut(a, b):
if mat[a][b]==None:
s=0
min_v=None
for i in range(a+1, b):
v=_log_cut(a, i)+_log_cut(i, b)
if min_v==None or v<min_v:
min_v=v
if min_v!=None:
s=min_v+m[b-1]
if a>0:
s-=m[a-1]
mat[a][b]=s
return mat[a][b]
mat=[[None for i in range(len(m)+1)] for j in range(len(m)+1)]
s=_log_cut(0, len(m))
return s

This scenario is analogous to divide-and-conquer sorting. Take quicksort, for example:
There is a partition step that requires a linear pass over an array to divide it into two subarrays. Similarly, the cost of cutting a log is equal to its length.
There is then a recursive step in which each subarray is recursively sorted. Similarly, you must recursively continue to cut each of the two pieces into which a log is cut, until you have cut at all marks.
Quicksort is, of course, O(n log n) in the best case, which occurs when each partition step (except base cases) divides the array into two nearly-equally-sized subarrays. Thus, all you need to do is to find the mark closest to the middle, "cut" the log there, and recurse.

Related

How to apply the Step-Count method to my binary search implementation

int binarySearch(int arr[], int left, int right, int x)
{
while( left <= right)
{
int mid = (left+right)/2;
if(arr[mid] == x)
{
return mid;
}
else if(arr[mid] > x)
{
right = mid-1;
}
else
{
left = mid+1;
}
}
return -1;
}
when I went through this myself I got 5n+4 = O(n) but somehow it is suppose to be O(logN) which I don't understand why that's the case.
int mean(int a[], size_t n)
{
int sum = 0; // 1 step
for (int i = 0; i < n; i++) // 1 step *(n+1)
sum += a[i]; // 1 step
return sum; // 1 step
}
I understand that the above code reduces to 2N+3 but this is a very basic example and doesn't take much thought to understand. Will someone please walk me through the binary search version as all the sources I have encountered don't make much sense to me.
Here is a link to one of the many other resources that I have used, but the explanation as to how each statement is separated into steps is what I prefer if possible.
how to calculate binary search complexity
In binary search you always reduce problem size by 1/2. Lets take an example: searching element is 19 and array size is 8 elements in a sorted array [1,4,7,8,11,16,19,22] then following will be the sequence of steps that a binary search will perform:
Get the middle element index i.e. divide the problem size by 1/2.
Check if element at index is greater than, less than or equal to your searching element.
a. If equal you are done, return the index
b. If searching element is greater, then keep looking on right half of array
c. If searching element is less, than look on left half of array
You continue step 1 and 2 until you are left with one element or you found the element.
In our example problem will look as follows:
Iteration 1: [1,4,7,8,11,16,19,22]
Iteration 2: [16,19,22]
Iteration 3: [19]
Order of complexity: O(log<sub>2</sub>(n))
i.e.
log<sub>2</sub>8 = 3, which means we required 3 steps to find our desired element. Even if element was not there (i.e. in worst case) time complexity of this algorithms remains log2n.
Its important to note base of log in binary search is 2 as we are reducing problem size by 1/2, if in any other algorithm we are reducing problem size by 1/3 than its log3 but asymptotically we call it as logarithmic algorithm irrespective of its base.
Note: Binary search can only be done on sorted data.
Suppose i have an array of 10 elements. Binary search will split the array into two halfs, in this case 5(call it L because these are left 5 elements) and 5 (call it right because these are right 5 elements).
Suppose the element you are trying to find is greater than middle elements , in this case x > array[5] then you just ignore first 5 elements and go to last five elements.
Now you have an array of five elements(starting from index 5 to 10). Now again you will split the array into two halfs , if x > array[mid] then you ignore left whole array and if it is smaller then you ignore whole right array.
In mathematical notation you get a series like this: {n , n/2,n/(2^2) , n/(2^m)}
Now if you try to solve this: Because the highest term is n/2^m so we have n/2^m = 1 and this has a solution as log(n)

Find minimum cost to convert array to arithmetic progression

I recently encountered this question in an interview. I couldn't really come up with an algorithm for this.
Given an array of unsorted integers, we have to find the minimum cost in which this array can be converted to an Arithmetic Progression where a cost of 1 unit is incurred if any element is changed in the array. Also, the value of the element ranges between (-inf,inf).
I sort of realised that DP can be used here, but I couldn't solve the equation. There were some constraints on the values, but I don't remember them. I am just looking for high level pseudo code.
EDIT
Here's a correct solution, unfortunately, while simple to understand it's not very efficient at O(n^3).
function costAP(arr) {
if(arr.length < 3) { return 0; }
var minCost = arr.length;
for(var i = 0; i < arr.length - 1; i++) {
for(var j = i + 1; j < arr.length; j++) {
var delta = (arr[j] - arr[i]) / (j - i);
var cost = 0;
for(var k = 0; k < arr.length; k++) {
if(k == i) { continue; }
if((arr[k] + delta * (i - k)) != arr[i]) { cost++; }
}
if(cost < minCost) { minCost = cost; }
}
}
return minCost;
}
Find the relative delta between every distinct pair of indices in the array
Use the relative delta to test the cost of transforming the whole array to AP using that delta
Return the minimum cost
Louis Ricci had the right basic idea of looking for the largest existing arithmetic progression, but assumed that it would have to appear in a single run, when in fact the elements of this progression can appear in any subset of the positions, e.g.:
1 42 3 69 5 1111 2222 8
requires just 4 changes:
42 69 1111 2222
1 3 5 8
To calculate this, notice that every AP has a rightmost element. We can suppose each element i of the input vector to be the rightmost AP position in turn, and for each such i consider all positions j to the left of i, determining the step size implied for each (i, j) combination and, when this is integer (indicating a valid AP), add one to the the number of elements that imply this step size and end at position i -- since all such elements belong to the same AP. The overall maximum is then the longest AP:
struct solution {
int len;
int pos;
int step;
};
solution longestArithProg(vector<int> const& v) {
solution best = { -1, 0, 0 };
for (int i = 1; i < v.size(); ++i) {
unordered_map<int, int> bestForStep;
for (int j = 0; j < i; ++j) {
int step = (v[i] - v[j]) / (i - j);
if (step * (i - j) == v[i] - v[j]) {
// This j gives an integer step size: record that j lies on this AP
int len = ++bestForStep[step];
if (len > best.len) {
best.len = len;
best.pos = i;
best.step = step;
}
}
}
}
++best.len; // We never counted the final element in the AP
return best;
}
The above C++ code uses O(n^2) time and O(n) space, since it loops over every pair of positions i and j, performing a single hash read and write for each. To answer the original problem:
int howManyChangesNeeded(vector<int> const& v) {
return v.size() - longestArithProg(v).len;
}
This problem has a simple geometric interpretation, which shows that it can be solved in O(n^2) time and probably can't be solved any faster than that (reduction from 3SUM). Suppose our array is [1, 2, 10, 3, 5]. We can write that array as a sequence of points
(0,1), (1,2), (2,10), (3,3), (4,5)
in which the x-value is the index of the array item and the y-value is the value of the array item. The question now becomes one of finding a line which passes the maximum possible number of points in that set. The cost of converting the array is the number of points not on a line, which is minimized when the number of points on a line is maximized.
A fairly definitive answer to that question is given in this SO posting: What is the most efficient algorithm to find a straight line that goes through most points?
The idea: for each point P in the set from left to right, find the line passing through that point and a maximum number of points to the right of P. (We don't need to look at points to the left of P because they would have been caught in an earlier iteration).
To find the maximum number of P-collinear points to the right of P, for each such point Q calculate the slope of the line segment PQ. Tally up the different slopes in a hash map. The slope which maps to the maximum number of hits is what you're looking for.
Technical issue: you probably don't want to use floating point arithmetic to calculate the slopes. On the other hand, if you use rational numbers, you potentially have to calculate the greatest common divisor in order to compare fractions by comparing numerator and denominator, which multiplies running time by a factor of log n. Instead, you should check equality of rational numbers a/b and c/d by testing whether ad == bc.
The SO posting referenced above gives a reduction from 3SUM, i.e., this problem is 3SUM-hard which shows that if this problem could be solved substantially faster than O(n^2), then 3SUM could also be solved substantially faster than O(n^2). This is where the condition that the integers are in (-inf,inf) comes in. If it is known that the integers are from a bounded set, the reduction from 3SUM is not definitive.
An interesting further question is whether the idea in the Wikipedia for solving 3SUM in O(n + N log N) time when the integers are in the bounded set (-N,N) can be used to solve the minimum cost to convert an array to an AP problem in time faster than O(n^2).
Given the array a = [a_1, a_2, ..., a_n] of unsorted integers, let diffs = [a_2-a_1, a_3-a_2, ..., a_n-a_(n-1)].
Find the maximum occurring value in diffs and adjust any values in a necessary so that all neighboring values differ by this amount.
Interestingly,even I had the same question in my campus recruitment test today.While doing the test itself,I realised that this logic of altering elements based on most frequent differences between 2 subsequent elements in the array fails in some cases.
Eg-4,5,8,9 .According to the logic of a2-a1,a3-a2 as proposed above,answer shud be 1 which is not the case.
As you suggested DP,I feel it can be on the lines of considering 2 values for each element in array-cost when it is modified as well as when it is not modified and return minimum of the 2.Finally terminate when you reach end of the array.

Minimal Number of Extract + Inserts required to sort a list

Context
this problem arises from trying to minimize number of expensive function calls
Problem Definition
Please note that extract_and_insert != swap. In particular, we take the element from position "from", insert it at position "to", and SHIFT all intermediate elements.
int n;
int A[n]; // all elements are integer and distinct
function extract_and_insert(from, to) {
int old_value = A[from]
if (from < to) {
for(int i = from; i < to; ++i)
A[i] = A[i+1];
A[to] = old_value;
} else {
for(int i = from; i > to; --i)
A[i] = A[i-1];
A[to] = old_value;
}
}
Question
We know there are O(n log n) algorithms for sorting a list of numbers.
Now: is there an O(n log n) function, which returns the minimum number of calls to extract_and_insert required to sort the list?
The answer is Yes.
This problem is essentially equivalent to finding the longest increasing subsequence (LIS) in an array, and you can use algorithms to solve that.
Why is this question equivalent to longest increasing subsequence?
Because each extract_and_insert operation will, at its most effective use, correct the relative position of exactly one element in the array. In other words, when we consider the length of the longest increasing subsequence of the array, each operation will increase that length by 1. So, the minimum number of required calls is:
length_of_array - length_of_LIS
and therefore by finding the length of LIS, we will be able to find the minimum number of operations required.
Do read up the linked Wikipedia page to see how to implement the algorithm.

Algorithm on interview

Recently I was asked the following interview question:
You have two sets of numbers of the same length N, for example A = [3, 5, 9] and B = [7, 5, 1]. Next, for each position i in range 0..N-1, you can pick either number A[i] or B[i], so at the end you will have another array C of length N which consists in elements from A and B. If sum of all elements in C is less than or equal to K, then such array is good. Please write an algorithm to figure out the total number of good arrays by given arrays A, B and number K.
The only solution I've come up is Dynamic Programming approach, when we have a matrix of size NxK and M[i][j] represents how many combinations could we have for number X[i] if current sum is equal to j. But looks like they expected me to come up with a formula. Could you please help me with that? At least what direction should I look for? Will appreciate any help. Thanks.
After some consideration, I believe this is an NP-complete problem. Consider:
A = [0, 0, 0, ..., 0]
B = [b1, b2, b3, ..., bn]
Note that every construction of the third set C = ( A[i] or B[i] for i = 0..n ) is is just the union of some subset of A and some subset of B. In this case, since every subset of A sums to 0, the sum of C is the same as the sum of some subset of B.
Now your question "How many ways can we construct C with a sum less than K?" can be restated as "How many subsets of B sum to less than K?". Solving this problem for K = 1 and K = 0 yields the solution to the subset sum problem for B (the difference between the two solutions is the number of subsets that sum to 0).
By similar argument, even in the general case where A contains nonzero elements, we can construct an array S = [b1-a1, b2-a2, b3-a3, ..., bn-an], and the question becomes "How many subsets of S sum to less than K - sum(A)?"
Since the subset sum problem is NP-complete, this problem must be also. So with that in mind, I would venture that the dynamic programming solution you proposed is the best you can do, and certainly no magic formula exists.
" Please write an algorithm to figure out the total number of good
arrays by given arrays A, B and number K."
Is it not the goal?
int A[];
int B[];
int N;
int K;
int Solutions = 0;
void FindSolutons(int Depth, int theSumSoFar) {
if (theSumSoFar > K) return;
if (Depth >= N) {
Solutions++;
return;
}
FindSolutions(Depth+1,theSumSoFar+A[Depth]);
FindSolutions(Depth+1,theSumSoFar+B[Depth]);
}
Invoke FindSolutions with both arguments set to zero. On return the Solutions will be equal to the number of good arrays;
this is how i would try to solve the problem
(Sorry if its stupid)
think of arrays
A=[3,5,9,8,2]
B=[7,5,1,8,2]
if
elements
0..N-1
number of choices
2^N
C1=0,C2=0
for all A[i]=B[i]
{
C1++
C2+=A[i]+B[i]
}
then create new two arrays like
A1=[3,5,9]
B1=[7,5,1]
also now C2 is 10
now number of all choices are reduced to 2^(N-C1)
now calculate all good numbers
using 'K' as K=K-C2
unfortunately
no matter what method you use, you have
to calculate sum 2^(N-C1) times
So there's 2^N choices, since at each point you either pick from A or from B. In the specific example you give where N happens to be 3 there are 8. For discussion you can characterise each set of decisions as a bit pattern.
So as a brute-force approach would try every single bit pattern.
But what should be obvious is that if the first few bits produce a number too large then every subsequent possible group of tail bits will also produce a number that is too large. So probably a better way to model it is a tree where you don't bother walking down the limbs that have already grown beyond your limit.
You can also compute the maximum totals that can be reached from each bit to the end of the table. If at any point your running total plus the maximum that you can obtain from here on down is less than K then every subtree from where you are is acceptable without any need for traversal. The case, as discussed in the comments, where every single combination is acceptable is a special case of this observation.
As pointed out by Serge below, a related observation is to us minimums and use the converse logic to cancel whole subtrees without traversal.
A potential further optimisation rests behind the observation that, as long as we shuffle each in the same way, changing the order of A and B has no effect because addition is commutative. You can therefore make an effort to ensure either that the maximums grow as quickly as possible or the minimums grow as slowly as possible, to try to get the earliest possible exit from traversal. In practice you'd probably want to apply a heuristic comparing the absolute maximum and minimum (both of which you've computed anyway) to K.
That being the case, a recursive implementation is easiest, e.g. (in C)
/* assume A, B and N are known globals */
unsigned int numberOfGoodArraysFromBit(
unsigned int bit,
unsigned int runningTotal,
unsigned int limit)
{
// have we ended up in an unacceptable subtree?
if(runningTotal > limit) return 0;
// have we reached the leaf node without at any
// point finding this subtree to be unacceptable?
if(bit >= N) return 1;
// maybe every subtree is acceptable?
if(runningTotal + MAXV[bit] <= limit)
{
return 1 << (N - bit);
}
// maybe no subtrees are acceptable?
if(runningTotal + MINV[bit] > limit)
{
return 0;
}
// if we can't prima facie judge the subtreees,
// we'll need specifically to evaluate them
return
numberOfGoodArraysFromBit(bit+1, runningTotal+A[bit], limit) +
numberOfGoodArraysFromBit(bit+1, runningTotal+B[bit], limit);
}
// work out the minimum and maximum values at each position
for(int i = 0; i < N; i++)
{
MAXV[i] = MAX(A[i], B[i]);
MINV[i] = MIN(A[i], B[i]);
}
// hence work out the cumulative totals from right to left
for(int i = N-2; i >= 0; i--)
{
MAXV[i] += MAXV[i+1];
MINV[i] += MINV[i+1];
}
// to kick it off
printf("Total valid combinations is %u", numberOfGoodArraysFromBit(0, 0, K));
I'm just thinking extemporaneously; it's likely better solutions exist.

Most efficient way of randomly choosing a set of distinct integers

I'm looking for the most efficient algorithm to randomly choose a set of n distinct integers, where all the integers are in some range [0..maxValue].
Constraints:
maxValue is larger than n, and possibly much larger
I don't care if the output list is sorted or not
all integers must be chosen with equal probability
My initial idea was to construct a list of the integers [0..maxValue] then extract n elements at random without replacement. But that seems quite inefficient, especially if maxValue is large.
Any better solutions?
Here is an optimal algorithm, assuming that we are allowed to use hashmaps. It runs in O(n) time and space (and not O(maxValue) time, which is too expensive).
It is based on Floyd's random sample algorithm. See my blog post about it for details.
The code is in Java:
private static Random rnd = new Random();
public static Set<Integer> randomSample(int max, int n) {
HashSet<Integer> res = new HashSet<Integer>(n);
int count = max + 1;
for (int i = count - n; i < count; i++) {
Integer item = rnd.nextInt(i + 1);
if (res.contains(item))
res.add(i);
else
res.add(item);
}
return res;
}
For small values of maxValue such that it is reasonable to generate an array of all the integers in memory then you can use a variation of the Fisher-Yates shuffle except only performing the first n steps.
If n is much smaller than maxValue and you don't wish to generate the entire array then you can use this algorithm:
Keep a sorted list l of number picked so far, initially empty.
Pick a random number x between 0 and maxValue - (elements in l)
For each number in l if it smaller than or equal to x, add 1 to x
Add the adjusted value of x into the sorted list and repeat.
If n is very close to maxValue then you can randomly pick the elements that aren't in the result and then find the complement of that set.
Here is another algorithm that is simpler but has potentially unbounded execution time:
Keep a set s of element picked so far, initially empty.
Pick a number at random between 0 and maxValue.
If the number is not in s, add it to s.
Go back to step 2 until s has n elements.
In practice if n is small and maxValue is large this will be good enough for most purposes.
One way to do it without generating the full array.
Say I want a randomly selected subset of m items from a set {x1, ..., xn} where m <= n.
Consider element x1. I add x1 to my subset with probability m/n.
If I do add x1 to my subset then I reduce my problem to selecting (m - 1) items from {x2, ..., xn}.
If I don't add x1 to my subset then I reduce my problem to selecting m items from {x2, ..., xn}.
Lather, rinse, and repeat until m = 0.
This algorithm is O(n) where n is the number of items I have to consider.
I rather imagine there is an O(m) algorithm where at each step you consider how many elements to remove from the "front" of the set of possibilities, but I haven't convinced myself of a good solution and I have to do some work now!
If you are selecting M elements out of N, the strategy changes depending on whether M is of the same order as N or much less (i.e. less than about N/log N).
If they are similar in size, then you go through each item from 1 to N. You keep track of how many items you've got so far (let's call that m items picked out of n that you've gone through), and then you take the next number with probability (M-m)/(N-n) and discard it otherwise. You then update m and n appropriately and continue. This is a O(N) algorithm with low constant cost.
If, on the other hand, M is significantly less than N, then a resampling strategy is a good one. Here you will want to sort M so you can find them quickly (and that will cost you O(M log M) time--stick them into a tree, for example). Now you pick numbers uniformly from 1 to N and insert them into your list. If you find a collision, pick again. You will collide about M/N of the time (actually, you're integrating from 1/N to M/N), which will require you to pick again (recursively), so you'll expect to take M/(1-M/N) selections to complete the process. Thus, your cost for this algorithm is approximately O(M*(N/(N-M))*log(M)).
These are both such simple methods that you can just implement both--assuming you have access to a sorted tree--and pick the one that is appropriate given the fraction of numbers that will be picked.
(Note that picking numbers is symmetric with not picking them, so if M is almost equal to N, then you can use the resampling strategy, but pick those numbers to not include; this can be a win, even if you have to push all almost-N numbers around, if your random number generation is expensive.)
My solution is the same as Mark Byers'. It takes O(n^2) time, hence it's useful when n is much smaller than maxValue. Here's the implementation in python:
def pick(n, maxValue):
chosen = []
for i in range(n):
r = random.randint(0, maxValue - i)
for e in chosen:
if e <= r:
r += 1
else:
break;
bisect.insort(chosen, r)
return chosen
The trick is to use a variation of shuffle or in other words a partial shuffle.
function random_pick( a, n )
{
N = len(a);
n = min(n, N);
picked = array_fill(0, n, 0); backup = array_fill(0, n, 0);
// partially shuffle the array, and generate unbiased selection simultaneously
// this is a variation on fisher-yates-knuth shuffle
for (i=0; i<n; i++) // O(n) times
{
selected = rand( 0, --N ); // unbiased sampling N * N-1 * N-2 * .. * N-n+1
value = a[ selected ];
a[ selected ] = a[ N ];
a[ N ] = value;
backup[ i ] = selected;
picked[ i ] = value;
}
// restore partially shuffled input array from backup
// optional step, if needed it can be ignored
for (i=n-1; i>=0; i--) // O(n) times
{
selected = backup[ i ];
value = a[ N ];
a[ N ] = a[ selected ];
a[ selected ] = value;
N++;
}
return picked;
}
NOTE the algorithm is strictly O(n) in both time and space, produces unbiased selections (it is a partial unbiased shuffling) and does not need hasmaps (which may not be available and/or usualy hide a complexity behind their implementation, e.g fetch time is not O(1), it might even be O(n) in worst case)
adapted from here
Linear congruential generator modulo maxValue+1. I'm sure I've written this answer before, but I can't find it...
UPDATE: I am wrong. The output of this is not uniformly distributed. Details on why are here.
I think this algorithm below is optimum. I.e. you cannot get better performance than this.
For choosing n numbers out of m numbers, the best offered algorithm so far is presented below. Its worst run time complexity is O(n), and needs only a single array to store the original numbers. It partially shuffles the first n elements from the original array, and then you pick those first n shuffled numbers as your solution.
This is also a fully working C program. What you find is:
Function getrand: This is just a PRNG that returns a number from 0 up to upto.
Function randselect: This is the function that randmoly chooses n unique numbers out of m many numbers. This is what this question is about.
Function main: This is only to demonstrate a use for other functions, so that you could compile it into a program and have fun.
#include <stdio.h>
#include <stdlib.h>
int getrand(int upto) {
long int r;
do {
r = rand();
} while (r > upto);
return r;
}
void randselect(int *all, int end, int select) {
int upto = RAND_MAX - (RAND_MAX % end);
int binwidth = upto / end;
int c;
for (c = 0; c < select; c++) {
/* randomly choose some bin */
int bin = getrand(upto)/binwidth;
/* swap c with bin */
int tmp = all[c];
all[c] = all[bin];
all[bin] = tmp;
}
}
int main() {
int end = 1000;
int select = 5;
/* initialize all numbers up to end */
int *all = malloc(end * sizeof(int));
int c;
for (c = 0; c < end; c++) {
all[c] = c;
}
/* select select unique numbers randomly */
srand(0);
randselect(all, end, select);
for (c = 0; c < select; c++) printf("%d ", all[c]);
putchar('\n');
return 0;
}
Here is the output of an example code where I randomly output 4 permutations out of a pool of 8 numbers for 100,000,000 many times. Then I use those many permutations to compute the probability of having each unique permutation occur. I then sort them by this probability. You notice that the numbers are fairly close, which I think means that it is uniformly distributed. The theoretical probability should be 1/1680 = 0.000595238095238095. Note how the empirical test is close to the theoretical one.

Resources