Get X unique numbers from a set

Get X unique numbers from a set - algorithm

What is the most elegant way to grab unique random numbers I ponder?
At the moment I need random unique numbers, I check to see if it's not unique by using a while loop to see if I've used the random number before.
So It looks like:
int n = getRandomNumber % [Array Size];
for each ( Previously used n in list)
Check if I've used n before, if I have...try again.
There are many ways to solve this linear O(n/2) problem, I just wonder if there is a elegant way to solve it. Trying to think back to MATH115 Discrete mathematics and remember if the old lecturer covered anything to do with a seemingly trivial problem.
I can't think at the moment, so maybe once I have some caffeine my brain will suss it with the heightened IQ induced from the Coffee.

If you want k random integers drawn without replacement (to get unique numbers) from the set {1, ..., n}, what you want is the first k elements in a random permutation of [n]. The most elegant way to generate such a random permutation is by using the Knuth shuffle. See here: http://en.wikipedia.org/wiki/Knuth_shuffle

grab unique random numbers I ponder?
Make an array of N unique elements (integers in range 0..N-1, for example), store N as arraySize and initialArraySize (arraySize = N; initialArraySize = N)
When random number is requested:
2.1 if arraySize is zero, then arraySize = initialArraySize
2.1 Generate index = getRandomNuber()%arraySize
2.3 result = array[index]. Do not return result yet.
2.2 swap array[index] with array[arraySize-1]. Swap means "exchange" c = array[index]; array[index] = array[arraySize-1]; array[arraySize-1] = c
2.3 decrease arraySize by 1.
2.4 return result.
You'll get a list of random numbers that won't repeat until you run out of unique values. O(1) complexity.

An n-bit Maximal Period Linear Shift Feedback Register (LFSR) will cycle through all of its (2^n -1) internal states before an internal state is repeated. A LFSR is a Maximal Period LFSR if and only if the polynomial formed from a tap sequence plus 1 is a primitive polynomial mod 2.
Thus, an n-bit Maximal Period LFSR will provide you with a sequence of (2^n - 1) unique random numbers, each one of them is n-bit long.
A LFSR is very elegant.

Since you're imposing uniqueness, then a pseudorandom generator should be sufficient, which can be configured to not repeat for as long a sequence as you probably need. Eg, an LCG: if seed is uint32 and initially 0, then use (1664525 * seed) + 1013904223 for the next seed and take the low word for your unrepeated 16-bit result.

Related

How to iterate over subsets of a bitwise mask?

I have a bitwise mask represented as an integer. The mask and integer is limited to 32-bit integers.
I am interested in examining all subsets of the set bits of a given mask/integer, but I don't know of a good way to quickly find these subsets.
The solution that I've been using is
for(int j = 1; j <= mask; ++j)
{
if(j & mask != 0)
{
// j is a valid subset of mask
}
}
But this requires looping from j = 1 to mask, and I think there should be a faster solution than this.
Is there a faster solution than this?
My followup question is if I want to constrain the subset to be of a fixed size (i.e., a fixed number of set bits), is there a simple way to do that as well?

Iterate all the subset of state in C++:
for (int subset=state; subset>0; subset=(subset-1)&state) {}
This tip is usually used in Bit mask + dp question. The total time complexity is O(3^n) to iterate all the subset of all state, which is a great improvement from O(4^n) if using the code in this question.

For a bitmask of length n the maximum number of subsets of set bits can be 2^n - 1. Hence we have to traverse over all the subsets if you want to examine them, and therefore the best complexity will be O(mask). The code cannot be improved further.
Ps: If you want to count the total number of subsets, then it can be solved using dynamic programming with much better time complexity.

Given x with a subset of the bits in mask, the next subset in order is ( (x|~mask) +1 ) & mask. This will wrap around to zero if x==mask.
I don't have a super fast way for subsets with a fixed number of bits.

Given a permutation's lexicographic number, is it possible to get any item in it in O(1)

I want to know whether the task explained below is even theoretically possible, and if so how I could do it.
You are given a space of N elements (i.e. all numbers between 0 and N-1.) Let's look at the space of all permutations on that space, and call it S. The ith member of S, which can be marked S[i], is the permutation with the lexicographic number i.
For example, if N is 3, then S is this list of permutations:
S[0]: 0, 1, 2
S[1]: 0, 2, 1
S[2]: 1, 0, 2
S[3]: 1, 2, 0
S[4]: 2, 0, 1
S[5]: 2, 1, 0
(Of course, when looking at a big N, this space becomes very large, N! to be exact.)
Now, I already know how to get the permutation by its index number i, and I already know how to do the reverse (get the lexicographic number of a given permutation.) But I want something better.
Some permutations can be huge by themselves. For example, if you're looking at N=10^20. (The size of S would be (10^20)! which I believe is the biggest number I ever mentioned in a Stack Overflow question :)
If you're looking at just a random permutation on that space, it would be so big that you wouldn't be able to store the whole thing on your harddrive, let alone calculate each one of the items by lexicographic number. What I want is to be able to do item access on that permutation, and also get the index of each item. That is, given N and i to specify a permutation, have one function that takes an index number and find the number that resides in that index, and another function that takes a number and finds in which index it resides. I want to do that in O(1), so I don't need to store or iterate over each member in the permutation.
Crazy, you say? Impossible? That may be. But consider this: A block cipher, like AES, is essentially a permutation, and it almost accomplishes the tasks I outlined above. AES has a block size of 16 bytes, meaning that N is 256^16 which is around 10^38. (The size of S, not that it matters, is a staggering (256^16)!, or around 10^85070591730234615865843651857942052838, which beats my recent record for "biggest number mentioned on Stack Overflow" :)
Each AES encryption key specifies a single permutation on N=256^16. That permutation couldn't be stored whole on your computer, because it has more members than there are atoms in the solar system. But, it allows you item access. By encrypting data using AES, you're looking at the data block by block, and for each block (member of range(N)) you output the encrypted block, which the member of range(N) that is in the index number of the original block in the permutation. And when you're decrypting, you're doing the reverse (Finding the index number of a block.) I believe this is done in O(1), I'm not sure but in any case it's very fast.
The problem with using AES or any other block cipher is that it limits you to very specific N, and it probably only captures a tiny fraction of the possible permutations, while I want to be able to use any N I like, and do item access on any permutation S[i] that I like.
Is it possible to get O(1) item access on a permutation, given size N and permutation number i? If so, how?
(If I'm lucky enough to get code answers here, I'd appreciate if they'll be in Python.)
UPDATE:
Some people pointed out the sad fact that the permutation number itself would be so huge, that just reading the number would make the task non-feasible. Then, I'd like to revise my question: Given access to the factoradic representation of a permutation's lexicographic number, is it possible to get any item in the permutation in O(as small as possible)?

The secret to doing this is to "count in base factorial".
In the same way that 134 = 1*10^2+3*10 + 4, 134 = 5! + 2 * 3! + 2! => 10210 in factorial notation (include 1!, exclude 0!). If you want to represent N!, you will then need N^2 base ten digits. (For each factorial digit N, the maximum number it can hold is N). Up to a bit of confusion about what you call 0, this factorial representation is exactly the lexicographic number of a permutation.
You can use this insight to solve Euler Problem 24 by hand. So I will do that here, and you will see how to solve your problem. We want the millionth permutation of 0-9. In factorial representation we take 1000000 => 26625122. Now to convert that to the permutation, I take my digits 0,1,2,3,4,5,6,7,8,9, and The first number is 2, which is the third (it could be 0), so I select 2 as the first digit, then I have a new list 0,1,3,4,5,6,7,8,9 and I take the seventh number which is 8 etc, and I get 2783915604.
However, this assumes that you start your lexicographic ordering at 0, if you actually start it at one, you have to subtract 1 from it, which gives 2783915460. Which is indeed the millionth permutation of the numbers 0-9.
You can obviously reverse this procedure, and hence convert backwards and forwards easily between the lexiographic number and the permutation that it represents.
I am not entirely clear what it is that you want to do here, but understanding the above procedure should help. For example, its clear that the lexiographic number represents an ordering which could be used as the key in a hashtable. And you can order numbers by comparing digits left to right so once you have inserted a number you never have to work outs it factorial.

Your question is a bit moot, because your input size for an arbitrary permutation index has size log(N!) (assuming you want to represent all possible permutations) which is Theta(N log N), so if N is really large then just reading the input of the permutation index would take too long, certainly much longer than O(1). It may be possible to store the permutation index in such a way that if you already had it stored, then you could access elements in O(1) time. But probably any such method would be equivalent to just storing the permutation in contiguous memory (which also has Theta(N log N) size), and if you store the permutation directly in memory then the question becomes trivial assuming you can do O(1) memory access. (However you still need to account for the size of the bit encoding of the element, which is O(log N)).
In the spirit of your encryption analogy, perhaps you should specify a small SUBSET of permutations according to some property, and ask if O(1) or O(log N) element access is possible for that small subset.

Edit:
I misunderstood the question, but it was not in waste. My algorithms let me understand: the factoradic representation of a permutation's lexicographic number is almost the same as the permutation itself. In fact the first digit of the factoradic representation is the same as the first element of the corresponding permutation (assuming your space consists of numbers from 0 to N-1). Knowing this there is not really a point in storing the index rather than the permutation itself . To see how to convert the lexicographic number into a permutation, read below.
See also this wikipedia link about Lehmer code.
Original post:
In the S space there are N elements that can fill the first slot, meaning that there are (N-1)! elements that start with 0. So i/(N-1)! is the first element (lets call it 'a'). The subset of S that starts with 0 consists of (N-1)! elements. These are the possible permutations of the set N{a}. Now you can get the second element: its the i(%((N-1)!)/(N-2)!). Repeat the process and you got the permutation.
Reverse is just as simple. Start with i=0. Get the 2nd last element of the permutation. Make a set of the last two elements, and find the element's position in it (its either the 0th element or the 1st), lets call this position j. Then i+=j*2!. Repeat the process (you can start with the last element too, but it will always be the 0th element of the possibilities).
Java-ish pesudo code:
find_by_index(List N, int i){
String str = "";
for(int l = N.length-1; i >= 0; i--){
int pos = i/fact(l);
str += N.get(pos);
N.remove(pos);
i %= fact(l);
}
return str;
}
find_index(String str){
OrderedList N;
int i = 0;
for(int l = str.length-1; l >= 0; l--){
String item = str.charAt(l);
int pos = N.add(item);
i += pos*fact(str.length-l)
}
return i;
}
find_by_index should run in O(n) assuming that N is pre ordered, while find_index is O(n*log(n)) (where n is the size of the N space)

After some research in Wikipedia, I desgined this algorithm:
def getPick(fact_num_list):
"""fact_num_list should be a list with the factorial number representation,
getPick will return a tuple"""
result = [] #Desired pick
#This will hold all the numbers pickable; not actually a set, but a list
#instead
inputset = range(len(fact_num_list))
for fnl in fact_num_list:
result.append(inputset[fnl])
del inputset[fnl] #Make sure we can't pick the number again
return tuple(result)
Obviously, this won't reach O(1) due the factor we need to "pick" every number. Due we do a for loop and thus, assuming all operations are O(1), getPick will run in O(n).
If we need to convert from base 10 to factorial base, this is an aux function:
import math
def base10_baseFactorial(number):
"""Converts a base10 number into a factorial base number. Output is a list
for better handle of units over 36! (after using all 0-9 and A-Z)"""
loop = 1
#Make sure n! <= number
while math.factorial(loop) <= number:
loop += 1
result = []
if not math.factorial(loop) == number:
loop -= 1 #Prevent dividing over a smaller number than denominator
while loop > 0:
denominator = math.factorial(loop)
number, rem = divmod(number, denominator)
result.append(rem)
loop -= 1
result.append(0) #Don't forget to divide to 0! as well!
return result
Again, this will run in O(n) due to the whiles.
Summing all, the best time we can find is O(n).
PS: I'm not a native English speaker, so spelling and phrasing errors may appear. Apologies in advance, and let me know if you can't get around something.

All correct algorithms for accessing the kth item of a permutation stored in factoradic form must read the first k digits. This is because, regardless of the values of the other digits among the first k, it makes a difference whether an unread digit is a 0 or takes on its maximum value. That this is the case can be seen by tracing the canonical correct decoding program in two parallel executions.
For example, if we want to decode the third digit of the permutation 1?0, then for 100, that digit is 0, and for 110, that digit is 2.

Generating permutations with sub-linear memory

I am wondering if there is a sufficiently simple algorithm for generating permutations of N elements, say 1..N, which uses less than O(N) memory. It does not have to be compute n-th permutation, but it must be able to compute all permutations.
Of course, this algorithm should be a generator of some kind, or use some internal data structure which uses less than O(N) memory, since return the result as a vector of size N already violates the restriction on sub-linear memory.

Let's assume that the random permutation is being generated one entry at a time. The state of the generator must encode the set of elements remaining (run it to completion) and so, since no possibility can be excluded, the generator state is at least n bits.

Maybe you can, with factoradic numbers. You can extract the resulting permutation from it step by step, so you never have to have the entire result in memory.
But the reason I started with maybe, is that I'm not sure what the growing behaviour of the size of the factoradic number itself is. If it fits in an 32bit integer or something like that, N would be limited to a constant so O(N) would equal O(1), so we have to use an array for it, but I'm unsure how big it will be in terms of N.

I think the answer has to be "no".
Consider the generator for N-element permutations as a state machine: it must contain at a least as many states as there are permutations, else it will start repeating before it finishes generating all of them.
There are N! such permutations, which will require at least ceil(log2(N!)) bits to represent. Stirling's approximation tells us log2(N!) is O(N log N), so we will be unable to create such a generator with sub-linear memory.

The C++ algorithm next_permutation performs an in-place rearrangement of a sequence into its next permutation, or returns false when no further permutations exist. The algorithm is as follows:
template <class BidirectionalIterator>
bool next_permutation(BidirectionalIterator first, BidirectionalIterator last) {
if (first == last) return false; // False for empty ranges.
BidirectionalIterator i = first;
++i;
if (i == last) return false; // False for single-element ranges.
i = last;
--i;
for(;;) {
BidirectionalIterator ii = i--;
// Find an element *n < *(n + 1).
if (*i <*ii) {
BidirectionalIterator j = last;
// Find the last *m >= *n.
while (!(*i < *--j)) {}
// Swap *n and *m, and reverse from m to the end.
iter_swap(i, j);
reverse(ii, last);
return true;
}
// No n was found.
if (i == first) {
// Reverse the sequence to its original order.
reverse(first, last);
return false;
}
}
}
This uses constant space (the iterators) for each permutation generated. Do you consider that linear?

I think that to even store your result (which will be an ordered list of N items) will be O(N) in memory, no?
Anyhow, to answer your later question about picking a permutation at random, here's a technique that will be better than just producing all N! possibilities in a list, say, and then picking an index randomly. If we can just pick the index randomly and generate the associated permutation from it, we're much better off.
We can imagine the dictionary order on your words/permutations, and associate a unique number to these based on the word's/permutation's order of appearance in the dictionary. E.g., words on three characters would be
perm. index
012 <----> 0
021 <----> 1
102 <----> 2
120 <----> 3
201 <----> 4
210 <----> 5
You'll see later why it was easiest to use the numbers we did, but others could be accommodated with a bit more work.
To choose one at random, you could pick its associated index randomly from the range 0 ... N!-1 with a uniform probability (the simplest implementation of this is clearly out of the question for even moderately large N, I know, but I think there are decent workarounds) and then determine its associated permutation. Notice that the list begins with all the permutations of the last N-1 elements, keeping the first digit fixed equal to 0. After those possibilities are exhausted, we generate all those that start with 1. After these next (N-1)! permutations are exhausted, we generate those that start with a 2. Etc. Thus we can determine the leading digit is Floor[R / (N-1)!], where R was the index in the sense shown above. See now why we zero indexed, too?
To generate the remaining N-1 digits in the permutation, let's say that we determined Floor[R/(N-1)!] = a0. Start with the list {0, ..., N-1} - {a0} (set subtraction). We want the Qth permutation of this list, for Q = R mod (N-1)!. Except for accounting for the fact that there's a missing digit, this is just the same as the problem we've just solved. Recurse.

Greatest GCD between some numbers

We've got some nonnegative numbers. We want to find the pair with maximum gcd. actually this maximum is more important than the pair!
For example if we have:
2 4 5 15
gcd(2,4)=2
gcd(2,5)=1
gcd(2,15)=1
gcd(4,5)=1
gcd(4,15)=1
gcd(5,15)=5
The answer is 5.

You can use the Euclidean Algorithm to find the GCD of two numbers.
while (b != 0)
{
int m = a % b;
a = b;
b = m;
}
return a;

If you want an alternative to the obvious algorithm, then assuming your numbers are in a bounded range, and you have plenty of memory, you can beat O(N^2) time, N being the number of values:
Create an array of a small integer type, indexes 1 to the max input. O(1)
For each value, increment the count of every element of the index which is a factor of the number (make sure you don't wraparound). O(N).
Starting at the end of the array, scan back until you find a value >= 2. O(1)
That tells you the max gcd, but doesn't tell you which pair produced it. For your example input, the computed array looks like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
4 2 1 1 2 0 0 0 0 0 0 0 0 0 1
I don't know whether this is actually any faster for the inputs you have to handle. The constant factors involved are large: the bound on your values and the time to factorise a value within that bound.
You don't have to factorise each value - you could use memoisation and/or a pregenerated list of primes. Which gives me the idea that if you are memoising the factorisation, you don't need the array:
Create an empty set of int, and a best-so-far value 1.
For each input integer:
if it's less than or equal to best-so-far, continue.
check whether it's in the set. If so, best-so-far = max(best-so-far, this-value), continue. If not:
add it to the set
repeat for all of its factors (larger than best-so-far).
Add/lookup in a set could be O(log N), although it depends what data structure you use. Each value has O(f(k)) factors, where k is the max value and I can't remember what the function f is...
The reason that you're finished with a value as soon as you encounter it in the set is that you've found a number which is a common factor of two input values. If you keep factorising, you'll only find smaller such numbers, which are not interesting.
I'm not quite sure what the best way is to repeat for the larger factors. I think in practice you might have to strike a balance: you don't want to do them quite in decreasing order because it's awkward to generate ordered factors, but you also don't want to actually find all the factors.
Even in the realms of O(N^2), you might be able to beat the use of the Euclidean algorithm:
Fully factorise each number, storing it as a sequence of exponents of primes (so for example 2 is {1}, 4 is {2}, 5 is {0, 0, 1}, 15 is {0, 1, 1}). Then you can calculate gcd(a,b) by taking the min value at each index and multiplying them back out. No idea whether this is faster than Euclid on average, but it might be. Obviously it uses a load more memory.

The optimisations I can think of is
1) start with the two biggest numbers since they are likely to have most prime factors and thus likely to have the most shared prime factors (and thus the highest GCD).
2) When calculating the GCDs of other pairs you can stop your Euclidean algorithm loop if you get below your current greatest GCD.
Off the top of my head I can't think of a way that you can work out the greatest GCD of a pair without trying to work out each pair individually (and optimise a bit as above).
Disclaimer: I've never looked at this problem before and the above is off the top of my head. There may be better ways and I may be wrong. I'm happy to discuss my thoughts in more length if anybody wants. :)

There is no O(n log n) solution to this problem in general. In fact, the worst case is O(n^2) in the number of items in the list. Consider the following set of numbers:
2^20 3^13 5^9 7^2*11^4 7^4*11^3
Only the GCD of the last two is greater than 1, but the only way to know that from looking at the GCDs is to try out every pair and notice that one of them is greater than 1.
So you're stuck with the boring brute-force try-every-pair approach, perhaps with a couple of clever optimizations to avoid doing needless work when you've already found a large GCD (while making sure that you don't miss anything).

With some constraints, e.g the numbers in the array are within a given range, say 1-1e7, it is doable in O(NlogN) / O(MAX * logMAX), where MAX is the maximum possible value in A.
Inspired from the sieve algorithm, and came across it in a Hackerrank Challenge -- there it is done for two arrays. Check their editorial.
find min(A) and max(A) - O(N)
create a binary mask, to mark which elements of A appear in the given range, for O(1) lookup; O(N) to build; O(MAX_RANGE) storage.
for every number a in the range (min(A), max(A)):
for aa = a; aa < max(A); aa += a:
if aa in A, increment a counter for aa, and compare it to current max_gcd, if counter >= 2 (i.e, you have two numbers divisible by aa);
store top two candidates for each GCD candidate.
could also ignore elements which are less than current max_gcd;
Previous answer:
Still O(N^2) -- sort the array; should eliminate some of the unnecessary comparisons;
max_gcd = 1
# assuming you want pairs of distinct elements.
sort(a) # assume in place
for ii = n - 1: -1 : 0 do
if a[ii] <= max_gcd
break
for jj = ii - 1 : -1 :0 do
if a[jj] <= max_gcd
break
current_gcd = GCD(a[ii], a[jj])
if current_gcd > max_gcd:
max_gcd = current_gcd
This should save some unnecessary computation.

There is a solution that would take O(n):
Let our numbers be a_i. First, calculate m=a_0*a_1*a_2*.... For each number a_i, calculate gcd(m/a_i, a_i). The number you are looking for is the maximum of these values.
I haven't proved that this is always true, but in your example, it works:
m=2*4*5*15=600,
max(gcd(m/2,2), gcd(m/4,4), gcd(m/5,5), gcd(m/15,15))=max(2, 2, 5, 5)=5
NOTE: This is not correct. If the number a_i has a factor p_j repeated twice, and if two other numbers also contain this factor, p_j, then you get the incorrect result p_j^2 insted of p_j. For example, for the set 3, 5, 15, 25, you get 25 as the answer instead of 5.
However, you can still use this to quickly filter out numbers. For example, in the above case, once you determine the 25, you can first do the exhaustive search for a_3=25 with gcd(a_3, a_i) to find the real maximum, 5, then filter out gcd(m/a_i, a_i), i!=3 which are less than or equal to 5 (in the example above, this filters out all others).
Added for clarification and justification:
To see why this should work, note that gcd(a_i, a_j) divides gcd(m/a_i, a_i) for all j!=i.
Let's call gcd(m/a_i, a_i) as g_i, and max(gcd(a_i, a_j),j=1..n, j!=i) as r_i. What I say above is g_i=x_i*r_i, and x_i is an integer. It is obvious that r_i <= g_i, so in n gcd operations, we get an upper bound for r_i for all i.
The above claim is not very obvious. Let's examine it a bit deeper to see why it is true: the gcd of a_i and a_j is the product of all prime factors that appear in both a_i and a_j (by definition). Now, multiply a_j with another number, b. The gcd of a_i and b*a_j is either equal to gcd(a_i, a_j), or is a multiple of it, because b*a_j contains all prime factors of a_j, and some more prime factors contributed by b, which may also be included in the factorization of a_i. In fact, gcd(a_i, b*a_j)=gcd(a_i/gcd(a_i, a_j), b)*gcd(a_i, a_j), I think. But I can't see a way to make use of this. :)
Anyhow, in our construction, m/a_i is simply a shortcut to calculate the product of all a_j, where j=1..1, j!=i. As a result, gcd(m/a_i, a_i) contains all gcd(a_i, a_j) as a factor. So, obviously, the maximum of these individual gcd results will divide g_i.
Now, the largest g_i is of particular interest to us: it is either the maximum gcd itself (if x_i is 1), or a good candidate for being one. To do that, we do another n-1 gcd operations, and calculate r_i explicitly. Then, we drop all g_j less than or equal to r_i as candidates. If we don't have any other candidate left, we are done. If not, we pick up the next largest g_k, and calculate r_k. If r_k <= r_i, we drop g_k, and repeat with another g_k'. If r_k > r_i, we filter out remaining g_j <= r_k, and repeat.
I think it is possible to construct a number set that will make this algorithm run in O(n^2) (if we fail to filter out anything), but on random number sets, I think it will quickly get rid of large chunks of candidates.

pseudocode
function getGcdMax(array[])
arrayUB=upperbound(array)
if (arrayUB<1)
error
pointerA=0
pointerB=1
gcdMax=0
do
gcdMax=MAX(gcdMax,gcd(array[pointera],array[pointerb]))
pointerB++
if (pointerB>arrayUB)
pointerA++
pointerB=pointerA+1
until (pointerB>arrayUB)
return gcdMax

Random number in range 0 to n

Given a function R which produces true random 32 bit numbers, I would like a function that returns random integers in the range 0 to n, where n is arbitrary (less than 2^32).
The function must produce all values 0 to n with equal probability.
I would like a function that executes in constant time with no if statements or loops, so something like the Java Random.nextInt(n) function is out.
I suspect that a simple modulus will not do the job unless n is a power of 2 -- am I right?
I have accepted Jason's answer, despite it requiring a loop of undetermined duration, since it appears to be the best method to use in practice and essentially answers my question. However I am still interested in any algorithms (even if less efficient) which would be deterministic in nature and be guaranteed to terminate, such as Mark Byers has pointed to.

Without discarding some of the values from the source, you can not do this. For example, a set of size 2^32 can not be partitioned into three equally sized sets. Therefore, it is impossible to do this without discarding some of the values and iterating until a non-discarded value is produced.
So, just use this (pseudocode):
rng is random number generator produces uniform integers from [0, max)
compute m = max modulo (n + 1)
do {
draw a random number r from rng
} while(r >= max - m)
return r modulo (n + 1)
Effectively I am throwing out the top part of the distribution that causes problems. If rng is uniform on [0, max), then this algorithm will be uniform on [0, n]

What you're asking for is impossible. You can't partition 2**32 numbers into three sets of exactly equal size.
If you want to guarantee an absolutely perfect uniform distribution in 0 <= x < n, where n is not a power of 2 then you have to be prepared to call R potentially an infinite number of times. In reality you will typically need only one or two calls, but the code has to in theory be able call R any number of times otherwise it can't be completely uniform.

I don't understand why modulus wouldn't do what you want? Since R is a function that produces true random 32 bit numbers, that means that each number has the same probability to be produced, right? So, if you use a modulus n:
randomNumber = R() % (n + 1) //EDITED: n+1 to return values from 0-n
then each number from 0 to n has the same probability!

You can generate two 32 bit numbers and put them together to form 64 bit number. Worst case scenario can be than biased by 0.99999999976716936 if you do not discharge numbers (if you need number whit no more than 32 bits) that mean that some number have by this factor lower probability than other.
But if you still want to remove this small bias you will have low ration "out of range" hits and in that case more that 1 discharge.

Depending upon your problem/use of the random numbers, maybe you could pre-allocate your random numbers using a slow method and put them into a simple array.
Then getNextRnd() can just return the next in the array.
Quick, fixed time call, no branches, just wasting memory (which is usually pretty cheap) and process initialization time.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio