Get a random element in single direction linked list by one time traverse - algorithm

I have a single direction linked list without knowing its size.
I want to get a random element in this list, and I just have one time chance to traverse the list. (I am not allowed to traverse twice or more)
What’s the algorithm for this problem? Thanks!

This is just reservoir sampling with a reservoir of size 1.
Essentially it is really simple
Pick the first element regardless (for a list of length 1, the first element is always the sample).
For every other element with probability 1/n where n is the number of elements observed so far, you replace the already picked element with the current element you are on.
This is uniformly sampled, since the probability of picking any element at the end of the day is 1/n (exercise to the reader).

This is probably an interview question.Reservoir sampling is used by data scientist to store relevant data in limited storage from large stream of data.
If you have to collect k elements from any array with elements n, such that you probability of each element collected should be same (k/n), you follow two steps,
1) Store first k elements in the storage.
2) When the next element(k+1) comes from the stream obviously you have no space in your collection anymore.Generate a random number from o to n, if the generated random number is less than k suppose l, replace storage[l] with the (k+1) element from stream.
Now, coming back to your question, here storage size is 1.So you will pick the first node,iterate over the list for second element.Now generate the random number ,if its 1, leave the sample alone otherwise switch the storage element from list

This question can be done using reservoir sampling. It is based on choosing k random items out of n items, but here n can be very large(which doesn't has to fit in memory!) and (as in your case) unknown initially.
The wikipedia has an understandable algorithm which i quote below:
array R[k]; // result
integer i, j;
// fill the reservoir array
for each i in 1 to k do
R[i] := S[i]
done;
// replace elements with gradually decreasing probability
for each i in k+1 to length(S) do
j := random(1, i); // important: inclusive range
if j <= k then
R[j] := S[i]
fi
done
The question requires only 1 value so we take k=1.
C implementation :
https://ideone.com/txnsas

This is the easiest way that I have found, it works fine and is understandable:
public int findrandom(Node start) {
Node curr = start;
int count = 1, result = 0, probability;
Random rand = new Random();
while (curr != null) {
probability = rand.nextInt(count) + 1;
if (count == probability)
result = curr.data;
count++;
curr = curr.next;
}
return result;
}

Related

Finding the sum of maximum difference possible from all subintervals of a given array

I have a problem designing an algorithm. The problem is that this should be executed in O(n) time.
Here is the assignment:
There is an unsorted array "a" with n numbers.
mij=min{ai, ai+1, ..., aj}, Mij=max{ai, ai+1, ..., aj}
Calculate:
S=SUM[i=1,n] { SUM[j=i,n] { (Mij - mij) } }
I am able to solve this in O(nlogn) time. This is a university research assignment. Everything that I tried suggests that this is not possible. I would be very thankful if you could point me in the right direction where to find the solution. Or at least prove that this is not possible.
Further explanation:
Given i and j, find the maximum and minimum elements of the array slice a[i:j]. Subtract those to get the range of the slice, a[max]-a[min].
Now, add up the ranges of all slices for all (i, j) such that 1 <= i <= j <= n. Do it in O(n) time.
This is pretty straight forward problem.
I will assume that it is array of objects (like pair of values or tuples) not numbers. First value is index in array and the second is value.
Right question here is how many time we need to multiply each number and add/subtract it from the sum ie in how many sub-sequences it is maximum and minimum element.
This problem is connected to finding next greatest element (nge), you can see here, just to know it for future problems.
I will write it in pseudo code.
subsum (A):
returnSum = 0
//i am pushing object into the stack. Firt value is index in array, secong is value
lastStackObject.push(-1, Integer.MAX_INT)
for (int i=1; i<n; i++)
next = stack.pop()
stack.push(next)
while (next.value < A[i].value)
last = stack.pop()
beforeLast = stack.peek()
retrunSum = returnSum + last.value*(i-last.index)*(last.index-beforeLast.index)
stack.push(A[i])
while stack is not empty:
last = stack.pop()
beforeLast = stack.peek()
retrunSum = returnSum + last.value*(A.length-last.index)*(last.index-beforeLast.index)
return returnSum
sum(A)
//first we calculate sum of maximum values in subarray, and then sum of minimum values. This is done by simply multiply each value in array by -1
retrun subsum(A)+subsum(-1 for x in A.value)
Time complexity of this code is O(n).
Peek function is just to read next value in stack without popping it.

Maximum of all possible subarrays of an array

How do I find/store maximum/minimum of all possible non-empty sub-arrays of an array of length n?
I generated the segment tree of the array and the for each possible sub array if did query into segment tree but that's not efficient. How do I do it in O(n)?
P.S n <= 10 ^7
For eg. arr[]= { 1, 2, 3 }; // the array need not to be sorted
sub-array min max
{1} 1 1
{2} 2 2
{3} 3 3
{1,2} 1 2
{2,3} 2 3
{1,2,3} 1 3
I don't think it is possible to store all those values in O(n). But it is pretty easy to create, in O(n), a structure that makes possible to answer, in O(1) the query "how many subsets are there where A[i] is the maximum element".
Naïve version:
Think about the naïve strategy: to know how many such subsets are there for some A[i], you could employ a simple O(n) algorithm that counts how many elements to the left and to the right of the array that are less than A[i]. Let's say:
A = [... 10 1 1 1 5 1 1 10 ...]
This 5 up has 3 elements to the left and 2 to the right lesser than it. From this we know there are 4*3=12 subarrays for which that very 5 is the maximum. 4*3 because there are 0..3 subarrays to the left and 0..2 to the right.
Optimized version:
This naïve version of the check would take O(n) operations for each element, so O(n^2) after all. Wouldn't it be nice if we could compute all these lengths in O(n) in a single pass?
Luckily there is a simple algorithm for that. Just use a stack. Traverse the array normally (from left to right). Put every element index in the stack. But before putting it, remove all the indexes whose value are lesser than the current value. The remaining index before the current one is the nearest larger element.
To find the same values at the right, just traverse the array backwards.
Here's a sample Python proof-of-concept that shows this algorithm in action. I implemented also the naïve version so we can cross-check the result from the optimized version:
from random import choice
from collections import defaultdict, deque
def make_bounds(A, fallback, arange, op):
stack = deque()
bound = [fallback] * len(A)
for i in arange:
while stack and op(A[stack[-1]], A[i]):
stack.pop()
if stack:
bound[i] = stack[-1]
stack.append(i)
return bound
def optimized_version(A):
T = zip(make_bounds(A, -1, xrange(len(A)), lambda x, y: x<=y),
make_bounds(A, len(A), reversed(xrange(len(A))), lambda x, y: x<y))
answer = defaultdict(lambda: 0)
for i, x in enumerate(A):
left, right = T[i]
answer[x] += (i-left) * (right-i)
return dict(answer)
def naive_version(A):
answer = defaultdict(lambda: 0)
for i, x in enumerate(A):
left = next((j for j in range(i-1, -1, -1) if A[j]>A[i]), -1)
right = next((j for j in range(i+1, len(A)) if A[j]>=A[i]), len(A))
answer[x] += (i-left) * (right-i)
return dict(answer)
A = [choice(xrange(32)) for i in xrange(8)]
MA1 = naive_version(A)
MA2 = optimized_version(A)
print 'Array: ', A
print 'Naive: ', MA1
print 'Optimized:', MA2
print 'OK: ', MA1 == MA2
I don't think it is possible to it directly in O(n) time: you need to iterate over all the elements of the subarrays, and you have n of them. Unless the subarrays are sorted.
You could, on the other hand, when initialising the subarrays, instead of making them normal arrays, you could build heaps, specifically min heaps when you want to find the minimum and max heaps when you want to find the maximum.
Building a heap is a linear time operation, and retrieving the maximum and minimum respectively for a max heap and min heap is a constant time operation, since those elements are found at the first place of the heap.
Heaps can be easily implemented just using a normal array.
Check this article on Wikipedia about binary heaps: https://en.wikipedia.org/wiki/Binary_heap.
I do not understand what exactly you mean by maximum of sub-arrays, so I will assume you are asking for one of the following
The subarray of maximum/minimum length or some other criteria (in which case the problem will reduce to finding max element in a 1 dimensional array)
The maximum elements of all your sub-arrays either in the context of one sub-array or in the context of the entire super-array
Problem 1 can be solved by simply iterating your super-array and storing a reference to the largest element. Or building a heap as nbro had said. Problem 2 also has a similar solution. However a linear scan is through n arrays of length m is not going to be linear. So you will have to keep your class invariants such that the maximum/minimum is known after every operation. Maybe with the help of some data structure like a heap.
Assuming you mean contiguous sub-arrays, create the array of partial sums where Yi = SUM(i=0..i)Xi, so from 1,4,2,3 create 0,1,1+4=5,1+4+2=7,1+4+2+3=10. You can create this from left to right in linear time, and the value of any contiguous subarray is one partial sum subtracted from another, so 4+2+3 = 1+4+2+3 - 1= 9.
Then scan through the partial sums from left to right, keeping track of the smallest value seen so far (including the initial zero). At each point subtract this from the current value and keep track of the highest value produced in this way. This should give you the value of the contiguous sub-array with largest sum, and you can keep index information, too, to find where this sub-array starts and ends.
To find the minimum, either change the above slightly or just reverse the sign of all the numbers and do exactly the same thing again: min(a, b) = -max(-a, -b)
I think the question you are asking is to find the Maximum of a subarry.
bleow is the code that cand do that in O(n) time.
int maxSumSubArr(vector<int> a)
{
int maxsum = *max_element(a.begin(), a.end());
if(maxsum < 0) return maxsum;
int sum = 0;
for(int i = 0; i< a.size; i++)
{
sum += a[i];
if(sum > maxsum)maxsum = sum;
if(sum < 0) sum = 0;
}
return maxsum;
}
Note: This code is not tested please add comments if found some issues.

implementing an algorithm that requires minimal memory

I am trying to implement an algorithm to find N-th largest element that requires minimal memory.
Example:
List of integers: 1;9;5;7;2;5. N: 2
after duplicates are removed, the list becomes 1;9;5;7;2.
So, the answer is 7, because 7 is the 2-nd largest element in the modified list.
In the below algorithm i am using bubble sort to sort my list and then removing duplicates without using a temp variable, does that make my program memory efficient ? Any ideas or suggestion
type Integer_array is Array (Natural range <>) of Integer;
procedure FindN-thLargestNumber (A : in out Integer_Array) is
b : Integer;
c:Integer;
begin
//sorting of array
for I in 0 to length(A) loop
for J in 1 to length(A) loop
if A[I] > A[J] then
A[I] = A[I] + A[J];
A[J] = A[I] - A[J];
A[I] = A[I] - A[J];
end if;
end loop;
end loop;
//remove duplicates
for K in 1 to length(A) loop
IF A[b] != A[K] then
b++;
A[b]=A[K];
end loop;
c = ENTER TO FIND N-th Largest number
PRINT A[b-(c-1)] ;
end FindN-th Largest Number
To find the n'th largest element you don't need to sort the main list at all. Note that this algorithm will perform well if N is smaller than M. If N is a large fraction of the list size then then you will be better off just sorting the list.
You just need a sorted list holding your N largest, then pick the smallest from that list (this is all untested so will probably need a couple of tweaks):
int[n] found = new int[n];
for (int i = 0;i<found.length;i++) {
found[i] = Integer.MIN_VALUE;
}
for (int i: list) {
if (i > found[0]) {
int insert = 0;
// Find the point in the array to insert the value
while (insert < found.length && found[insert] < i) {
insert++;
}
// If not at the end we have found a larger value, so move back one before inserting
if (found[insert] >= i) {
insert --;
}
// insert the value and shuffle everything BELOW it down.
for (int j=insert;j<=0;j--) {
int temp = found[j];
found[j]=i;
i=temp;
}
}
}
At the end you have the top N values from your list sorted in order. the first entry in the list is Nth value, the last entry the top value.
If you need the N-th largest element, then you don't need to sort the complete array. You should apply selection sort, but only for the required N steps.
Instead of using bubble sort, use quicksort kind of partial sorting.
Pick a key and using as a pivot move around elements (move all the elements>= pivot to the left of the array)
Count how many unique elements are there that are greater than equal to pivot.
If the number is less than N, then the answer is to the right of the array. Otherwise it is in the left part of the array (left or right as compared to pivot)
Iteratively repeat with smaller array and appropriate N
Complexity is O(n) and you will need constant extra memory.
HeapSort uses constant additional memory, so it has minimal space complexity, albeit it doesn't use a minimal number of variables.
It sorts in O(n log n) time which I think is optimal time complexity for this problem because of the needed to ignore duplicates. I may be wrong.
Of course you don't need to complete the heapsort -- just heapify the array and then pop out the first N non-duplicate largest values.
If you really do want to minimise memory usage to the point that you care about one temporary variable one way or the other, then you probably have to accept terrible performance. This becomes a purely theoretical exercise, though -- it will not make your code more memory efficient in practice, because in non-recursive code there is no practical difference between using, say 64 bytes of stack vs using 128 bytes of stack.

Most efficient way of randomly choosing a set of distinct integers

I'm looking for the most efficient algorithm to randomly choose a set of n distinct integers, where all the integers are in some range [0..maxValue].
Constraints:
maxValue is larger than n, and possibly much larger
I don't care if the output list is sorted or not
all integers must be chosen with equal probability
My initial idea was to construct a list of the integers [0..maxValue] then extract n elements at random without replacement. But that seems quite inefficient, especially if maxValue is large.
Any better solutions?
Here is an optimal algorithm, assuming that we are allowed to use hashmaps. It runs in O(n) time and space (and not O(maxValue) time, which is too expensive).
It is based on Floyd's random sample algorithm. See my blog post about it for details.
The code is in Java:
private static Random rnd = new Random();
public static Set<Integer> randomSample(int max, int n) {
HashSet<Integer> res = new HashSet<Integer>(n);
int count = max + 1;
for (int i = count - n; i < count; i++) {
Integer item = rnd.nextInt(i + 1);
if (res.contains(item))
res.add(i);
else
res.add(item);
}
return res;
}
For small values of maxValue such that it is reasonable to generate an array of all the integers in memory then you can use a variation of the Fisher-Yates shuffle except only performing the first n steps.
If n is much smaller than maxValue and you don't wish to generate the entire array then you can use this algorithm:
Keep a sorted list l of number picked so far, initially empty.
Pick a random number x between 0 and maxValue - (elements in l)
For each number in l if it smaller than or equal to x, add 1 to x
Add the adjusted value of x into the sorted list and repeat.
If n is very close to maxValue then you can randomly pick the elements that aren't in the result and then find the complement of that set.
Here is another algorithm that is simpler but has potentially unbounded execution time:
Keep a set s of element picked so far, initially empty.
Pick a number at random between 0 and maxValue.
If the number is not in s, add it to s.
Go back to step 2 until s has n elements.
In practice if n is small and maxValue is large this will be good enough for most purposes.
One way to do it without generating the full array.
Say I want a randomly selected subset of m items from a set {x1, ..., xn} where m <= n.
Consider element x1. I add x1 to my subset with probability m/n.
If I do add x1 to my subset then I reduce my problem to selecting (m - 1) items from {x2, ..., xn}.
If I don't add x1 to my subset then I reduce my problem to selecting m items from {x2, ..., xn}.
Lather, rinse, and repeat until m = 0.
This algorithm is O(n) where n is the number of items I have to consider.
I rather imagine there is an O(m) algorithm where at each step you consider how many elements to remove from the "front" of the set of possibilities, but I haven't convinced myself of a good solution and I have to do some work now!
If you are selecting M elements out of N, the strategy changes depending on whether M is of the same order as N or much less (i.e. less than about N/log N).
If they are similar in size, then you go through each item from 1 to N. You keep track of how many items you've got so far (let's call that m items picked out of n that you've gone through), and then you take the next number with probability (M-m)/(N-n) and discard it otherwise. You then update m and n appropriately and continue. This is a O(N) algorithm with low constant cost.
If, on the other hand, M is significantly less than N, then a resampling strategy is a good one. Here you will want to sort M so you can find them quickly (and that will cost you O(M log M) time--stick them into a tree, for example). Now you pick numbers uniformly from 1 to N and insert them into your list. If you find a collision, pick again. You will collide about M/N of the time (actually, you're integrating from 1/N to M/N), which will require you to pick again (recursively), so you'll expect to take M/(1-M/N) selections to complete the process. Thus, your cost for this algorithm is approximately O(M*(N/(N-M))*log(M)).
These are both such simple methods that you can just implement both--assuming you have access to a sorted tree--and pick the one that is appropriate given the fraction of numbers that will be picked.
(Note that picking numbers is symmetric with not picking them, so if M is almost equal to N, then you can use the resampling strategy, but pick those numbers to not include; this can be a win, even if you have to push all almost-N numbers around, if your random number generation is expensive.)
My solution is the same as Mark Byers'. It takes O(n^2) time, hence it's useful when n is much smaller than maxValue. Here's the implementation in python:
def pick(n, maxValue):
chosen = []
for i in range(n):
r = random.randint(0, maxValue - i)
for e in chosen:
if e <= r:
r += 1
else:
break;
bisect.insort(chosen, r)
return chosen
The trick is to use a variation of shuffle or in other words a partial shuffle.
function random_pick( a, n )
{
N = len(a);
n = min(n, N);
picked = array_fill(0, n, 0); backup = array_fill(0, n, 0);
// partially shuffle the array, and generate unbiased selection simultaneously
// this is a variation on fisher-yates-knuth shuffle
for (i=0; i<n; i++) // O(n) times
{
selected = rand( 0, --N ); // unbiased sampling N * N-1 * N-2 * .. * N-n+1
value = a[ selected ];
a[ selected ] = a[ N ];
a[ N ] = value;
backup[ i ] = selected;
picked[ i ] = value;
}
// restore partially shuffled input array from backup
// optional step, if needed it can be ignored
for (i=n-1; i>=0; i--) // O(n) times
{
selected = backup[ i ];
value = a[ N ];
a[ N ] = a[ selected ];
a[ selected ] = value;
N++;
}
return picked;
}
NOTE the algorithm is strictly O(n) in both time and space, produces unbiased selections (it is a partial unbiased shuffling) and does not need hasmaps (which may not be available and/or usualy hide a complexity behind their implementation, e.g fetch time is not O(1), it might even be O(n) in worst case)
adapted from here
Linear congruential generator modulo maxValue+1. I'm sure I've written this answer before, but I can't find it...
UPDATE: I am wrong. The output of this is not uniformly distributed. Details on why are here.
I think this algorithm below is optimum. I.e. you cannot get better performance than this.
For choosing n numbers out of m numbers, the best offered algorithm so far is presented below. Its worst run time complexity is O(n), and needs only a single array to store the original numbers. It partially shuffles the first n elements from the original array, and then you pick those first n shuffled numbers as your solution.
This is also a fully working C program. What you find is:
Function getrand: This is just a PRNG that returns a number from 0 up to upto.
Function randselect: This is the function that randmoly chooses n unique numbers out of m many numbers. This is what this question is about.
Function main: This is only to demonstrate a use for other functions, so that you could compile it into a program and have fun.
#include <stdio.h>
#include <stdlib.h>
int getrand(int upto) {
long int r;
do {
r = rand();
} while (r > upto);
return r;
}
void randselect(int *all, int end, int select) {
int upto = RAND_MAX - (RAND_MAX % end);
int binwidth = upto / end;
int c;
for (c = 0; c < select; c++) {
/* randomly choose some bin */
int bin = getrand(upto)/binwidth;
/* swap c with bin */
int tmp = all[c];
all[c] = all[bin];
all[bin] = tmp;
}
}
int main() {
int end = 1000;
int select = 5;
/* initialize all numbers up to end */
int *all = malloc(end * sizeof(int));
int c;
for (c = 0; c < end; c++) {
all[c] = c;
}
/* select select unique numbers randomly */
srand(0);
randselect(all, end, select);
for (c = 0; c < select; c++) printf("%d ", all[c]);
putchar('\n');
return 0;
}
Here is the output of an example code where I randomly output 4 permutations out of a pool of 8 numbers for 100,000,000 many times. Then I use those many permutations to compute the probability of having each unique permutation occur. I then sort them by this probability. You notice that the numbers are fairly close, which I think means that it is uniformly distributed. The theoretical probability should be 1/1680 = 0.000595238095238095. Note how the empirical test is close to the theoretical one.

Algorithm for finding all combinations of k elements from stack of n "randomly"

I have a problem resembling the one described here:
Algorithm to return all combinations of k elements from n
I am looking for something similar that covers all possible combinations of k from n. However, I need a subset to vary a lot from the one drawn previously. For example, if I were to draw a subset of 3 elements from a set of 8, the following algorithm wouldn't be useful to me since every subset is very similar to the one previously drawn:
11100000,
11010000,
10110000,
01110000,
...
I am looking for an algorithm thats picks the subsets in a more "random" looking fashion, ie. where the majority of elements in one subset is not reused in the next:
11100000,
00010011,
00101100,
...
Does anyone know of such an algorithm?
I hope my question made sence and that someone can help me out =)
Kind regards,
Christian
How about first generating all possible combinations of k from n, and then rearranging them with help of a random function.
If you have the result in a vector, loop through the vector: for each element let it change place with the element at a random position.
This of course becomes slow for large k and n.
This is not really random, but depending on your needs it might suit you.
Calculate the number of possible combinations. Let's name them N.
Calculate a large number which is coprime to N. Let's name it P.
Order the combinations and give them numbers from 1 to N. Let's name them C1 to CN
Iterate for output combinations. The first one will be VP mod N, the second one will be C2*P mod N, the third one C3*P mod N, etc. In essence, Outputi = Ci*P mod N. Mod is meant as the modulus operator.
If P is picked carefully, you will get seemingly random combinations. Values close to 1 or to N will produce values that differ little. Better pick values close to, say N/4 or N/5. You can also randomize the generation of P for every iteration that you need.
As a follow-up to my comment on this answer, here is some code that allows one to determine the composition of a subset from its "index", in colex order.
Shamelessly stolen from my own assignments.
//////////////////////////////////////
// NChooseK
//
// computes n!
// --------
// k!(n-k)!
//
// using Pascal's identity
// i.e. (n,k) = (n-1,k-1) + (n-1,k)
//
// easily optimizable by memoization
long long NChooseK(int n, int k)
{
if(k >= 0 && k <= n && n >= 1)
{
if( k > n / 2)
k = n - k;
if(k == 0 || n == 0)
return 1;
else
return NChooseK(n-1, k-1) + NChooseK(n-1, k);
}
else
return 0;
}
///////////////////////////////////////////////////////////////////////
// SubsetColexUnrank
// The unranking works by finding each element
// in turn, beginning with the biggest, leftmost one.
// We just have to find, for each element, how many subsets there are
// before the one beginning with the elements we have already found.
//
// It stores its results (indices of the elements present in the subset) into T, in ascending order.
void SubsetColexUnrank(long long r, int * T, int subsetSize)
{
assert( subsetSize >= 1 );
// For each element in the k-subset to be found
for(int i = subsetSize; i >= 1; i--)
{
// T[i] cannot be less than i
int x = i;
// Find the smallest element such that, of all the k-subsets that contain it,
// none has a rank that exceeds r.
while( NChooseK(x, i) <= r )
x++;
// update T with the newly found element
T[i] = x;
// if the subset we have to find is not the first one containing this element
if(r > 0)
{
// finding the next element of our k-subset
// is like finding the first one of the same subset
// divided by {T[i]}
r -= NChooseK(x - 1, i);
}
}
}
Random-in, random-out.
The colex order is such that its unranking function does not need the size of the set from which to pick the elements to work; the number of elements is assumed to be NChooseK(size of the set, size of the subset).
How about randomly choosing the k elements. ie choose the pth where p is random between 1 and n, then reorder what's left and choose the qth where q is between 1 and n-1 etc?
or maybe i misunderstood. do you still want all possibilities? in that case you can always generate them first and then choose random entries from your list
By "random looking" I think you mean lexicographically distant.. does this apply to combination i vs. i-1, or i vs. all previous combinations?
If so, here are some suggestions:
since most of the combinators yield ordered output, there are two options:
design or find a generator which somehow yields non-ordered output
enumerate and store enough/all combinations in a tie'd array file/db
if you decide to go with door #2, then you can just access randomly ordered combinations by random integers between 1 and the # of combinations
Just as a final check, compare the current and previous combination using a measure of difference/distance between combinations, e.g. for an unsigned Bit::Vector in Perl:
$vec1->Lexicompare($vec2) >= $MIN_LEX_DIST
You might take another look behind door #1, since even for moderate values of n and k you can get a big array:
EDIT:
Just saw your comment to AnnK... maybe the lexicompare might still help you skip similar combinations?
Depending on what you are trying to do, you could do something like playing cards. Keep two lists: Source is your source (unused) list; and Used the second is the "already-picked" list. As you randomly pick k items from Source, you move them to your Used list.
If there are k items left in Source when you need to pick again, you pick them all and swap the lists. If there are fewer than k items, you pick j items from Used and add them to Source to make k items in Source, then pick them all and swap the lists.
This is kind of like picking k cards from a deck. You discard them to the used pile. Once you reach the end or need more cards, you shuffle the old ones back into play.
This is just to make sure each set is definitely different from the previous subsets.
Also, this will not really guarantee that all possible subsets are picked before old ones start being repeated.
The good is that you don't need to worry about pre-calculating all the subsets, and your memory requirements are linear with your data (2 n-sized lists).

Resources