How to convert the following recurrence to top down dynamic programming? - algorithm

I am trying to solve the Maximum Product Subarray problem from leetcode.
The problem description is: Given an integer array, find the contiguous subarray within the array containing at least one number which has the largest product.
Example: Input: [2,3,-2,4], Output: 6
To solve this I am using the following logic: let f(p,n) output the correct result until index n of the array where the result is p. So the recurrence is:
f(p,n) = p // if(n=a.length)
f(p,n) = max( p, f(p*a[n], n+1), f(a[n], n+1) ) // otherwise
This works for regular recursion (code below).
private int f(int[] a, int p, int n) {
if(n==a.length)
return p;
else
return max(p, f(a, p*a[n], n+1), f(a, a[n], n+1));
}
However I am having trouble converting it to top-down dynamic programming. The approach I have been using to convert a recursive program into one that uses top-down DP is:
Initialize a cache (I will be using an array)
If cache at index 'n' has been filled return the value as result
Otherwise recurse and store the result in cache
Return value from cache.
This is a general approach that I have been using and it has worked for most of the dp problems I have done however it does not work for this problem.
The (incorrect) code using this approach is shown below:
private int f(int[] a, int p, int n, int[] dp) {
if(dp[n]!=0)
return dp[n];
if(n==a.length)
dp[n] = p;
else
dp[n] = max(p, f_m(a, p*a[n], n+1, dp), f_m(a, a[n], n+1, dp));
return dp[n];
}
I call the functions from the main function as follows:
// int x = f(a, a[0], 1, dp); - for incorrect top-down dp attempt
// int x = f(a, a[0], 1); - for regular recursion
An example where it does not work is: [3,-1,4]. Here it incorrectly outputs 3 instead of 4.
From what I understand, the problem is because both subproblems refer to the same n+1 index of the DP array so only 1 subproblem is solved which results in the incorrect answer.
So my question is:
How can I convert this recurrence to a top-down DP program? Is there a general approach that I can follow for cases like this?

Your dp state depends on both the current index n and the current result p. So you need to memoize the result in a 2D array instead of using a 1D array for just index.
private int f(int[] a, int p, int n, int[] dp) {
if(dp[n][p]!=0)
return dp[n][p];
if(n==a.length)
dp[n][p] = p;
else
dp[n][p] = max(p, f_m(a, p*a[n], n+1, dp), f_m(a, a[n], n+1, dp));
return dp[n][p];
}

You can do it the way you are trying but , I will suggest you an easy way to the problem Its o(n) and doesn't even require to store the array thus o(1) space.
Let us keep 2 variables min and max. which stores the minimum and the maximum product so far , we are keeping min marker due to -ve numbers as two negative numbers can product to large number.
rest is easy,
initialise min=1 and max=1 and ans=0(as qs says atleast one number needs to be there you can change this initialisation accordingly) i.e the first element.
start reading the input one element at a time say 'a'
loop over length of the array
{
if(a>0)
max= a * max;
min=(1 < min * a) ? 1 : min * a ;
else if (a<0)
max=(1 > min * a) ? 1 : min * a ;
min=max * a;
else
max=1;
min=1;
ans=(ans > max) ? ans : max; // this is outside else
}
at the end of loop max will be the answer, happy coding :)

Related

How to solve weighted Activity selection with use of Segment Trees and Binary search?

Given N jobs where every job is represented by following three elements of it.
1) Start Time
2) Finish Time.
3) Profit or Value Associated.
Find the maximum profit subset of jobs such that no two jobs in the subset overlap.
I know a dynamic programming solution which has a complexity of O(N^2) (close to LIS where we have to just check the previous elements with which we can merge the current interval and take the interval whose merging gives maximum till the i th element ).This solution can be further improved to O(N*log N ) using Binary search and simple sorting!
But my friend was telling me that it can be even solved by using Segment Trees and binary search! I have no clue as to where I am going to use Segment Tree and how .??
Can you help?
On request,sorry not commented
What I am doing is sorting on the basis of the starting index, storing the maximum obtainable value till i at DP[i] by merging previous intervals and their maximum obtainable value !
void solve()
{
int n,i,j,k,high;
scanf("%d",&n);
pair < pair < int ,int>, int > arr[n+1];// first pair represents l,r and int alone shows cost
int dp[n+1];
memset(dp,0,sizeof(dp));
for(i=0;i<n;i++)
scanf("%d%d%d",&arr[i].first.first,&arr[i].first.second,&arr[i].second);
std::sort(arr,arr+n); // by default sorting on the basis of starting index
for(i=0;i<n;i++)
{
high=arr[i].second;
for(j=0;j<i;j++)//checking all previous mergable intervals //Note we will use DP[] of the mergable interval due to optimal substructure
{
if(arr[i].first.first>=arr[j].first.second)
high=std::max(high , dp[j]+arr[i].second);
}
dp[i]=high;
}
for(i=0;i<n;i++)
dp[n-1]=std::max(dp[n-1],dp[i]);
printf("%d\n",dp[n-1]);
}
int main()
{solve();return 0;}
EDIT:
My working code finally took me 3 hours to debug it though! Morover this code is slower than the binary search and sorting one due to a larger constant and bad implementation :P (just for reference)
#include<stdio.h>
#include<algorithm>
#include<vector>
#include<cstring>
#include<iostream>
#include<climits>
#define lc(idx) (2*idx+1)
#define rc(idx) (2*idx+2)
#define mid(l,r) ((l+r)/2)
using namespace std;
int Tree[4*2*10000-1];
void update(int L,int R,int qe,int idx,int value)
{
if(value>Tree[0])
Tree[0]=value;
while(L<R)
{
if(qe<= mid(L,R))
{
idx=lc(idx);
R=mid(L,R);
}
else
{
idx=rc(idx);
L=mid(L,R)+1;
}
if(value>Tree[idx])
Tree[idx]=value;
}
return ;
}
int Get(int L,int R,int idx,int q)
{
if(q<L )
return 0;
if(R<=q)
return Tree[idx];
return max(Get(L,mid(L,R),lc(idx),q),Get(mid(L,R)+1,R,rc(idx),q));
}
bool cmp(pair < pair < int , int > , int > A,pair < pair < int , int > , int > B)
{
return A.first.second< B.first.second;
}
int main()
{
int N,i;
scanf("%d",&N);
pair < pair < int , int > , int > P[N];
vector < int > V;
for(i=0;i<N;i++)
{
scanf("%d%d%d",&P[i].first.first,&P[i].first.second,&P[i].second);
V.push_back(P[i].first.first);
V.push_back(P[i].first.second);
}
sort(V.begin(),V.end());
for(i=0;i<N;i++)
{
int &l=P[i].first.first,&r=P[i].first.second;
l=lower_bound(V.begin(),V.end(),l)-V.begin();
r=lower_bound(V.begin(),V.end(),r)-V.begin();
}
sort(P,P+N,cmp);
int ans=0;
memset(Tree,0,sizeof(Tree));
for(i=0;i<N;i++)
{
int aux=Get(0,2*N-1,0,P[i].first.first)+P[i].second;
if(aux>ans)
ans=aux;
update(0,2*N-1,P[i].first.second,0,ans);
}
printf("%d\n",ans);
return 0;
}
high=arr[i].second;
for(j=0;j<i;j++)//checking all previous mergable intervals //Note we will use DP[] of the mergable interval due to optimal substructure
{
if(arr[i].first.first>=arr[j].first.second)
high=std::max(high, dp[j]+arr[i].second);
}
dp[i]=high;
This can be done in O(log n) with a segment tree.
First of all, let's rewrite it a bit. The max you are taking is a bit complicated, because it takes the maximum of a sum involving both i and j. But i is constant in this part, so let's take it out.
high=dp[0];
for(j=1;j<i;j++)//checking all previous mergable intervals //Note we will use DP[] of the mergable interval due to optimal substructure
{
if(arr[i].first.first>=arr[j].first.second)
high=std::max(high, dp[j]);
}
dp[i]=high + arr[i].second;
Great, now we have reduced the problem to determining the maximum in [0, i - 1] out of the values that satisfy your if condition.
If we didn't have the if, it would be a simple application of segment trees.
Now there are two choices.
1. Deal with O(log V) query time and O(V) memory for the segment tree
Where V is the maximum size of an interval's endpoint.
You can build a segment tree to which you insert interval start points as you move your i. Then you query over the range of values. Something like this, where the segment tree is initialized to -infinity and of size O(V).
Update(node, index, value):
if node.associated_interval == [index, index]:
node.max = value
return
if index in node.left.associated_interval:
Update(node.left, index, value)
else:
Update(node.right, index, value)
node.max = max(node.left.max, node.right.max)
Query(node, left, right):
if [left, right] does not intersect node.associated_interval:
return -infinity
if node.associated_interval included in [left, right]:
return node.max
return max(Query(node.left, left, right),
Query(node.right, left, right))
[...]
high=Query(tree, 0, arr[i].first.first)
dp[i]=high + arr[i].second;
Update(tree, arr[i].first.first, dp[i])
2. Reducing to O(log n) query time and O(n) memory for the segment tree
Since the number of intervals might be significantly less than their length, it's reasonable to think that we might be able to encode them better somehow, so that their length is also O(n). Indeed, we can.
This involves normalizing your intervals in the range [1, 2*n]. Consider the following intervals
8 100
3 50
90 92
Let's plot them on a line. They'd look like this:
3 8 50 90 92 100
Now replace each of them with their index:
1 2 3 4 5 6
3 8 50 90 92 100
And write your new intervals:
2 6
1 3
4 5
Note that they retain the properties of your initial intervals: the same ones overlap, the same ones are included in each other etc.
This can be done with a sort. You can now apply the same segment tree algorithm, except you declare the segment tree for the size 2*n.

Converting this recursive solution to DP

Given a stack of integers, players take turns at removing either 1, 2, or 3 numbers from the top of the stack. Assuming that the opponent plays optimally and you select first, I came up with the following recursion:
int score(int n) {
if (n <= 0) return 0;
if (n <= 3) {
return sum(v[0..n-1]);
}
// maximize over picking 1, 2, or 3 + value after opponent picks optimally
return max(v[n-1] + min(score(n-2), score(n-3), score(n-4)),
v[n-1] + v[n-2] + min(score(n-3), score(n-4), score(n-5)),
v[n-1] + v[n-2] + v[n-3] + min(score(n-4), score(n-5), score(n-6)));
}
Basically, at each level comparing the outcomes of selecting 1, 2, or 3 and then your opponent selecting either 1, 2, or 3.
I was wondering how I could convert this to a DP solution as it is clearly exponential. I was struggling with the fact that there seem to be 3 dimensions to it: num of your pick, num of opponent's pick, and sub problem size, i.e., it seems the best solution for table[p][o][n] would need to be maintained, where p is the number of values you choose, o is the number your opponent chooses and n is the size of the sub problem.
Do I actually need the 3 dimensions? I have seen this similar problem: http://www.geeksforgeeks.org/dynamic-programming-set-31-optimal-strategy-for-a-game/ , but couldn't seem to adapt it.
Here is way the problem can be converted into DP :-
score[i] = maximum{ sum[i] - score[i+1] , sum[i] - score[i+2] , sum[i] - score[i+3] }
Here score[i] means max score generated from game [i to n] where v[i] is top of stack. sum[i] is sum of all elements on the stack from i onwards. sum[i] can be evaluated using a separate DP in O(N). The above DP can be solved using table in O(N)
Edit :-
Following is a DP solution in JAVA :-
public class game {
static boolean play_game(int[] stack) {
if(stack.length<=3)
return true;
int[] score = new int[stack.length];
int n = stack.length;
score[n-1] = stack[n-1];
score[n-2] = score[n-1]+stack[n-2];
score[n-3] = score[n-2]+stack[n-3];
int sum = score[n-3];
for(int i=n-4;i>=0;i--) {
sum = stack[i]+sum;
int min = Math.min(Math.min(score[i+1],score[i+2]),score[i+3]);
score[i] = sum-min;
}
if(sum-score[0]<score[0])
return true;
return false;
}
public static void main(String args[]) {
int[] stack = {12,1,7,99,3};
System.out.printf("I win => "+play_game(stack));
}
EDIT:-
For getting a DP solution you need to visualize a problems solution in terms of the smaller instances of itself. For example in this case as both players are playing optimally , after the choice made by first one ,the second player also obtains an optimal score for remaining stack which the subproblem of the first one. The only problem here is that how represent it in a recurrence . To solve DP you must first define a recurrence relation in terms of subproblem which precedes the current problem in any way of computation. Now we know that whatever second player wins , first player loses so effectively first player gains total sum - score of second player. As second player as well plays optimally we can express the solution in terms of recursion.

finding the position of a fraction in farey sequence

For finding the position of a fraction in farey sequence, i tried to implement the algorithm given here http://www.math.harvard.edu/~corina/publications/farey.pdf under "initial algorithm" but i can't understand where i'm going wrong, i am not getting the correct answers . Could someone please point out my mistake.
eg. for order n = 7 and fractions 1/7 ,1/6 i get same answers.
Here's what i've tried for given degree(n), and a fraction a/b:
sum=0;
int A[100000];
A[1]=a;
for(i=2;i<=n;i++)
A[i]=i*a-a;
for(i=2;i<=n;i++)
{
for(j=i+i;j<=n;j+=i)
A[j]-=A[i];
}
for(i=1;i<=n;i++)
sum+=A[i];
ans = sum/b;
Thanks.
Your algorithm doesn't use any particular properties of a and b. In the first part, every relevant entry of the array A is a multiple of a, but the factor is independent of a, b and n. Setting up the array ignoring the factor a, i.e. starting with A[1] = 1, A[i] = i-1 for 2 <= i <= n, after the nested loops, the array contains the totients, i.e. A[i] = phi(i), no matter what a, b, n are. The sum of the totients from 1 to n is the number of elements of the Farey sequence of order n (plus or minus 1, depending on which of 0/1 and 1/1 are included in the definition you use). So your answer is always the approximation (a*number of terms)/b, which is close but not exact.
I've not yet looked at how yours relates to the algorithm in the paper, check back for updates later.
Addendum: Finally had time to look at the paper. Your initialisation is not what they give. In their algorithm, A[q] is initialised to floor(x*q), for a rational x = a/b, the correct initialisation is
for(i = 1; i <= n; ++i){
A[i] = (a*i)/b;
}
in the remainder of your code, only ans = sum/b; has to be changed to ans = sum;.
A non-algorithmic way of finding the position t of a fraction in the Farey sequence of order n>1 is shown in Remark 7.10(ii)(a) of the paper, under m:=n-1, where mu-bar stands for the number-theoretic Mobius function on positive integers taking values from the set {-1,0,1}.
Here's my Java solution that works. Add head(0/1), tail(1/1) nodes to a SLL.
Then start by passing headNode,tailNode and setting required orderLevel.
public void generateSequence(Node leftNode, Node rightNode){
Fraction left = (Fraction) leftNode.getData();
Fraction right= (Fraction) rightNode.getData();
FractionNode midNode = null;
int midNum = left.getNum()+ right.getNum();
int midDenom = left.getDenom()+ right.getDenom();
if((midDenom <=getMaxLevel())){
Fraction middle = new Fraction(midNum,midDenom);
midNode = new FractionNode(middle);
}
if(midNode!= null){
leftNode.setNext(midNode);
midNode.setNext(rightNode);
generateSequence(leftNode, midNode);
count++;
}else if(rightNode.next()!=null){
generateSequence(rightNode, rightNode.next());
}
}

Find unique common element from 3 arrays

Original Problem:
I have 3 boxes each containing 200 coins, given that there is only one person who has made calls from all of the three boxes and thus there is one coin in each box which has same fingerprints and rest of all coins have different fingerprints. You have to find the coin which contains same fingerprint from all of the 3 boxes. So that we can find the fingerprint of the person who has made call from all of the 3 boxes.
Converted problem:
You have 3 arrays containing 200 integers each. Given that there is one and only one common element in these 3 arrays. Find the common element.
Please consider solving this for other than trivial O(1) space and O(n^3) time.
Some improvement in Pelkonen's answer:
From converted problem in OP:
"Given that there is one and only one common element in these 3 arrays."
We need to sort only 2 arrays and find common element.
If you sort all the arrays first O(n log n) then it will be pretty easy to find the common element in less than O(n^3) time. You can for example use binary search after sorting them.
Let N = 200, k = 3,
Create a hash table H with capacity ≥ Nk.
For each element X in array 1, set H[X] to 1.
For each element Y in array 2, if Y is in H and H[Y] == 1, set H[Y] = 2.
For each element Z in array 3, if Z is in H and H[Z] == 2, return Z.
throw new InvalidDataGivenByInterviewerException();
O(Nk) time, O(Nk) space complexity.
Use a hash table for each integer and encode the entries such that you know which array it's coming from - then check for the slot which has entries from all 3 arrays. O(n)
Use a hashtable mapping objects to frequency counts. Iterate through all three lists, incrementing occurrence counts in the hashtable, until you encounter one with an occurrence count of 3. This is O(n), since no sorting is required. Example in Python:
def find_duplicates(*lists):
num_lists = len(lists)
counts = {}
for l in lists:
for i in l:
counts[i] = counts.get(i, 0) + 1
if counts[i] == num_lists:
return i
Or an equivalent, using sets:
def find_duplicates(*lists):
intersection = set(lists[0])
for l in lists[1:]:
intersection = intersection.intersect(set(l))
return intersection.pop()
O(N) solution: use a hash table. H[i] = list of all integers in the three arrays that map to i.
For all H[i] > 1 check if three of its values are the same. If yes, you have your solution. You can do this check with the naive solution even, it should still be very fast, or you can sort those H[i] and then it becomes trivial.
If your numbers are relatively small, you can use H[i] = k if i appears k times in the three arrays, then the solution is the i for which H[i] = 3. If your numbers are huge, use a hash table though.
You can extend this to work even if you can have elements that can be common to only two arrays and also if you can have elements repeating elements in one of the arrays. It just becomes a bit more complicated, but you should be able to figure it out on your own.
If you want the fastest* answer:
Sort one array--time is N log N.
For each element in the second array, search the first. If you find it, add 1 to a companion array; otherwise add 0--time is N log N, using N space.
For each non-zero count, copy the corresponding entry into the temporary array, compacting it so it's still sorted--time is N.
For each element in the third array, search the temporary array; when you find a hit, stop. Time is less than N log N.
Here's code in Scala that illustrates this:
import java.util.Arrays
val a = Array(1,5,2,3,14,1,7)
val b = Array(3,9,14,4,2,2,4)
val c = Array(1,9,11,6,8,3,1)
Arrays.sort(a)
val count = new Array[Int](a.length)
for (i <- 0 until b.length) {
val j =Arrays.binarySearch(a,b(i))
if (j >= 0) count(j) += 1
}
var n = 0
for (i <- 0 until count.length) if (count(i)>0) { count(n) = a(i); n+= 1 }
for (i <- 0 until c.length) {
if (Arrays.binarySearch(count,0,n,c(i))>=0) println(c(i))
}
With slightly more complexity, you can either use no extra space at the cost of being even more destructive of your original arrays, or you can avoid touching your original arrays at all at the cost of another N space.
Edit: * as the comments have pointed out, hash tables are faster for non-perverse inputs. This is "fastest worst case". The worst case may not be so unlikely unless you use a really good hashing algorithm, which may well eat up more time than your sort. For example, if you multiply all your values by 2^16, the trivial hashing (i.e. just use the bitmasked integer as an index) will collide every time on lists shorter than 64k....
//Begineers Code using Binary Search that's pretty Easy
// bool BS(int arr[],int low,int high,int target)
// {
// if(low>high)
// return false;
// int mid=low+(high-low)/2;
// if(target==arr[mid])
// return 1;
// else if(target<arr[mid])
// BS(arr,low,mid-1,target);
// else
// BS(arr,mid+1,high,target);
// }
// vector <int> commonElements (int A[], int B[], int C[], int n1, int n2, int n3)
// {
// vector<int> ans;
// for(int i=0;i<n2;i++)
// {
// if(i>0)
// {
// if(B[i-1]==B[i])
// continue;
// }
// //The above if block is to remove duplicates
// //In the below code we are searching an element form array B in both the arrays A and B;
// if(BS(A,0,n1-1,B[i]) && BS(C,0,n3-1,B[i]))
// {
// ans.push_back(B[i]);
// }
// }
// return ans;
// }

Algorithm to select a single, random combination of values?

Say I have y distinct values and I want to select x of them at random. What's an efficient algorithm for doing this? I could just call rand() x times, but the performance would be poor if x, y were large.
Note that combinations are needed here: each value should have the same probability to be selected but their order in the result is not important. Sure, any algorithm generating permutations would qualify, but I wonder if it's possible to do this more efficiently without the random order requirement.
How do you efficiently generate a list of K non-repeating integers between 0 and an upper bound N covers this case for permutations.
Robert Floyd invented a sampling algorithm for just such situations. It's generally superior to shuffling then grabbing the first x elements since it doesn't require O(y) storage. As originally written it assumes values from 1..N, but it's trivial to produce 0..N and/or use non-contiguous values by simply treating the values it produces as subscripts into a vector/array/whatever.
In pseuocode, the algorithm runs like this (stealing from Jon Bentley's Programming Pearls column "A sample of Brilliance").
initialize set S to empty
for J := N-M + 1 to N do
T := RandInt(1, J)
if T is not in S then
insert T in S
else
insert J in S
That last bit (inserting J if T is already in S) is the tricky part. The bottom line is that it assures the correct mathematical probability of inserting J so that it produces unbiased results.
It's O(x)1 and O(1) with regard to y, O(x) storage.
Note that, in accordance with the combinations tag in the question, the algorithm only guarantees equal probability of each element occuring in the result, not of their relative order in it.
1O(x2) in the worst case for the hash map involved which can be neglected since it's a virtually nonexistent pathological case where all the values have the same hash
Assuming that you want the order to be random too (or don't mind it being random), I would just use a truncated Fisher-Yates shuffle. Start the shuffle algorithm, but stop once you have selected the first x values, instead of "randomly selecting" all y of them.
Fisher-Yates works as follows:
select an element at random, and swap it with the element at the end of the array.
Recurse (or more likely iterate) on the remainder of the array, excluding the last element.
Steps after the first do not modify the last element of the array. Steps after the first two don't affect the last two elements. Steps after the first x don't affect the last x elements. So at that point you can stop - the top of the array contains uniformly randomly selected data. The bottom of the array contains somewhat randomized elements, but the permutation you get of them is not uniformly distributed.
Of course this means you've trashed the input array - if this means you'd need to take a copy of it before starting, and x is small compared with y, then copying the whole array is not very efficient. Do note though that if all you're going to use it for in future is further selections, then the fact that it's in somewhat-random order doesn't matter, you can just use it again. If you're doing the selection multiple times, therefore, you may be able to do only one copy at the start, and amortise the cost.
If you really only need to generate combinations - where the order of elements does not matter - you may use combinadics as they are implemented e.g. here by James McCaffrey.
Contrast this with k-permutations, where the order of elements does matter.
In the first case (1,2,3), (1,3,2), (2,1,3), (2,3,1), (3,1,2), (3,2,1) are considered the same - in the latter, they are considered distinct, though they contain the same elements.
In case you need combinations, you may really only need to generate one random number (albeit it can be a bit large) - that can be used directly to find the m th combination.
Since this random number represents the index of a particular combination, it follows that your random number should be between 0 and C(n,k).
Calculating combinadics might take some time as well.
It might just not worth the trouble - besides Jerry's and Federico's answer is certainly simpler than implementing combinadics.
However if you really only need a combination and you are bugged about generating the exact number of random bits that are needed and none more... ;-)
While it is not clear whether you want combinations or k-permutations, here is a C# code for the latter (yes, we could generate only a complement if x > y/2, but then we would have been left with a combination that must be shuffled to get a real k-permutation):
static class TakeHelper
{
public static IEnumerable<T> TakeRandom<T>(
this IEnumerable<T> source, Random rng, int count)
{
T[] items = source.ToArray();
count = count < items.Length ? count : items.Length;
for (int i = items.Length - 1 ; count-- > 0; i--)
{
int p = rng.Next(i + 1);
yield return items[p];
items[p] = items[i];
}
}
}
class Program
{
static void Main(string[] args)
{
Random rnd = new Random(Environment.TickCount);
int[] numbers = new int[] { 1, 2, 3, 4, 5, 6, 7 };
foreach (int number in numbers.TakeRandom(rnd, 3))
{
Console.WriteLine(number);
}
}
}
Another, more elaborate implementation that generates k-permutations, that I had lying around and I believe is in a way an improvement over existing algorithms if you only need to iterate over the results. While it also needs to generate x random numbers, it only uses O(min(y/2, x)) memory in the process:
/// <summary>
/// Generates unique random numbers
/// <remarks>
/// Worst case memory usage is O(min((emax-imin)/2, num))
/// </remarks>
/// </summary>
/// <param name="random">Random source</param>
/// <param name="imin">Inclusive lower bound</param>
/// <param name="emax">Exclusive upper bound</param>
/// <param name="num">Number of integers to generate</param>
/// <returns>Sequence of unique random numbers</returns>
public static IEnumerable<int> UniqueRandoms(
Random random, int imin, int emax, int num)
{
int dictsize = num;
long half = (emax - (long)imin + 1) / 2;
if (half < dictsize)
dictsize = (int)half;
Dictionary<int, int> trans = new Dictionary<int, int>(dictsize);
for (int i = 0; i < num; i++)
{
int current = imin + i;
int r = random.Next(current, emax);
int right;
if (!trans.TryGetValue(r, out right))
{
right = r;
}
int left;
if (trans.TryGetValue(current, out left))
{
trans.Remove(current);
}
else
{
left = current;
}
if (r > current)
{
trans[r] = left;
}
yield return right;
}
}
The general idea is to do a Fisher-Yates shuffle and memorize the transpositions in the permutation.
It was not published anywhere nor has it received any peer-review whatsoever. I believe it is a curiosity rather than having some practical value. Nonetheless I am very open to criticism and would generally like to know if you find anything wrong with it - please consider this (and adding a comment before downvoting).
A little suggestion: if x >> y/2, it's probably better to select at random y - x elements, then choose the complementary set.
The trick is to use a variation of shuffle or in other words a partial shuffle.
function random_pick( a, n )
{
N = len(a);
n = min(n, N);
picked = array_fill(0, n, 0); backup = array_fill(0, n, 0);
// partially shuffle the array, and generate unbiased selection simultaneously
// this is a variation on fisher-yates-knuth shuffle
for (i=0; i<n; i++) // O(n) times
{
selected = rand( 0, --N ); // unbiased sampling N * N-1 * N-2 * .. * N-n+1
value = a[ selected ];
a[ selected ] = a[ N ];
a[ N ] = value;
backup[ i ] = selected;
picked[ i ] = value;
}
// restore partially shuffled input array from backup
// optional step, if needed it can be ignored
for (i=n-1; i>=0; i--) // O(n) times
{
selected = backup[ i ];
value = a[ N ];
a[ N ] = a[ selected ];
a[ selected ] = value;
N++;
}
return picked;
}
NOTE the algorithm is strictly O(n) in both time and space, produces unbiased selections (it is a partial unbiased shuffling) and non-destructive on the input array (as a partial shuffle would be) but this is optional
adapted from here
update
another approach using only a single call to PRNG (pseudo-random number generator) in [0,1] by IVAN STOJMENOVIC, "ON RANDOM AND ADAPTIVE PARALLEL GENERATION OF COMBINATORIAL OBJECTS" (section 3), of O(N) (worst-case) complexity
Here is a simple way to do it which is only inefficient if Y is much larger than X.
void randomly_select_subset(
int X, int Y,
const int * inputs, int X, int * outputs
) {
int i, r;
for( i = 0; i < X; ++i ) outputs[i] = inputs[i];
for( i = X; i < Y; ++i ) {
r = rand_inclusive( 0, i+1 );
if( r < i ) outputs[r] = inputs[i];
}
}
Basically, copy the first X of your distinct values to your output array, and then for each remaining value, randomly decide whether or not to include that value.
The random number is further used to choose an element of our (mutable) output array to replace.
If, for example, you have 2^64 distinct values, you can use a symmetric key algorithm (with a 64 bits block) to quickly reshuffle all combinations. (for example Blowfish).
for(i=0; i<x; i++)
e[i] = encrypt(key, i)
This is not random in the pure sense but can be useful for your purpose.
If you want to work with arbitrary # of distinct values following cryptographic techniques you can but it's more complex.

Resources