Reverse factorial - algorithm

Reverse factorial - algorithm

Well, we all know that if N is given it's easy to calculate N!. But what about the inverse?
N! is given and you are about to find N - Is that possible ? I'm curious.

Set X=1.
Generate F=X!
Is F = the input? If yes, then X is N.
If not, then set X=X+1, then start again at #2.
You can optimize by using the previous result of F to compute the new F (new F = new X * old F).
It's just as fast as going the opposite direction, if not faster, given that division generally takes longer than multiplication. A given factorial A! is guaranteed to have all integers less than A as factors in addition to A, so you'd spend just as much time factoring those out as you would just computing a running factorial.

If you have Q=N! in binary, count the trailing zeros. Call this number J.
If N is 2K or 2K+1, then J is equal to 2K minus the number of 1's in the binary representation of 2K, so add 1 over and over until the number of 1's you have added is equal to the number of 1's in the result.
Now you know 2K, and N is either 2K or 2K+1. To tell which one it is, count the factors of the biggest prime (or any prime, really) in 2K+1, and use that to test Q=(2K+1)!.
For example, suppose Q (in binary) is
1111001110111010100100110000101011001111100000110110000000000000000000
(Sorry it's so small, but I don't have tools handy to manipulate larger numbers.)
There are 19 trailing zeros, which is
10011
Now increment:
1: 10100
2: 10101
3: 10110 bingo!
So N is 22 or 23. I need a prime factor of 23, and, well, I have to pick 23 (it happens that 2K+1 is prime, but I didn't plan that and it isn't needed). So 23^1 should divide 23!, it doesn't divide Q, so
N=22

int inverse_factorial(int factorial){
int current = 1;
while (factorial > current) {
if (factorial % current) {
return -1; //not divisible
}
factorial /= current;
++current;
}
if (current == factorial) {
return current;
}
return -1;
}

Yes. Let's call your input x. For small values of x, you can just try all values of n and see if n! = x. For larger x, you can binary-search over n to find the right n (if one exists). Note hat we have n! ≈ e^(n ln n - n) (this is Stirling's approximation), so you know approximately where to look.
The problem of course, is that very few numbers are factorials; so your question makes sense for only a small set of inputs. If your input is small (e.g. fits in a 32-bit or 64-bit integer) a lookup table would be the best solution.
(You could of course consider the more general problem of inverting the Gamma function. Again, binary search would probably be the best way, rather than something analytic. I'd be glad to be shown wrong here.)
Edit: Actually, in the case where you don't know for sure that x is a factorial number, you may not gain all that much (or anything) with binary search using Stirling's approximation or the Gamma function, over simple solutions. The inverse factorial grows slower than logarithmic (this is because the factorial is superexponential), and you have to do arbitrary-precision arithmetic to find factorials and multiply those numbers anyway.
For instance, see Draco Ater's answer for an idea that (when extended to arbitrary-precision arithmetic) will work for all x. Even simpler, and probably even faster because multiplication is faster than division, is Dav's answer which is the most natural algorithm... this problem is another triumph of simplicity, it appears. :-)

Well, if you know that M is really the factorial of some integer, then you can use
n! = Gamma(n+1) = sqrt(2*PI) * exp(-n) * n^(n+1/2) + O(n^(-1/2))
You can solve this (or, really, solve ln(n!) = ln Gamma(n+1)) and find the nearest integer.
It is still nonlinear, but you can get an approximate solution by iteration easily (in fact, I expect the n^(n+1/2) factor is enough).

Multiple ways. Use lookup tables, use binary search, use a linear search...
Lookup tables is an obvious one:
for (i = 0; i < MAX; ++i)
Lookup[i!] = i; // you can calculate i! incrementally in O(1)
You could implement this using hash tables for example, or if you use C++/C#/Java, they have their own hash table-like containers.
This is useful if you have to do this a lot of times and each time it has to be fast, but you can afford to spend some time building this table.
Binary search: assume the number is m = (1 + N!) / 2. Is m! larger than N!? If yes, reduce the search between 1 and m!, otherwise reduce it between m! + 1 and N!. Recursively apply this logic.
Of course, these numbers might be very big and you might end up doing a lot of unwanted operations. A better idea is to search between 1 and sqrt(N!) using binary search, or try to find even better approximations, though this might not be easy. Consider studying the gamma function.
Linear search: Probably the best in this case. Calculate 1*2*3*...*k until the product is equal to N! and output k.

If the input number is really N!, its fairly simple to calculate N.
A naive approach computing factorials will be too slow, due to the overhead of big integer arithmetic. Instead we can notice that, when N ≥ 7, each factorial can be uniquely identified by its length (i.e. number of digits).
The length of an integer x can be computed as log10(x) + 1.
Product rule of logarithms: log(a*b) = log(a) + log(b)
By using above two facts, we can say that length of N! is:
which can be computed by simply adding log10(i) until we get length of our input number, since log(1*2*3*...*n) = log(1) + log(2) + log(3) + ... + log(n).
This C++ code should do the trick:
double result = 0;
for (int i = 1; i <= 1000000; ++i) { // This should work for 1000000! (where inputNumber has 10^7 digits)
result += log10(i);
if ( (int)result + 1 == inputNumber.size() ) { // assuming inputNumber is a string of N!
std::cout << i << endl;
break;
}
}
(remember to check for cases where n<7 (basic factorial calculation should be fine here))
Complete code: https://pastebin.com/9EVP7uJM

Here is some clojure code:
(defn- reverse-fact-help [n div]
(cond (not (= 0 (rem n div))) nil
(= 1 (quot n div)) div
:else (reverse-fact-help (/ n div) (+ div 1))))
(defn reverse-fact [n] (reverse-fact-help n 2))
Suppose n=120, div=2. 120/2=60, 60/3=20, 20/4=5, 5/5=1, return 5
Suppose n=12, div=2. 12/2=6, 6/3=2, 2/4=.5, return 'nil'

int p = 1,i;
//assume variable fact_n has the value n!
for(i = 2; p <= fact_n; i++) p = p*i;
//i is the number you are looking for if p == fact_n else fact_n is not a factorial
I know it isn't a pseudocode, but it's pretty easy to understand

inverse_factorial( X )
{
X_LOCAL = X;
ANSWER = 1;
while(1){
if(X_LOCAL / ANSWER == 1)
return ANSWER;
X_LOCAL = X_LOCAL / ANSWER;
ANSWER = ANSWER + 1;
}
}

This function is based on successive approximations! I created it and implemented it in Advanced Trigonometry Calculator 1.7.0
double arcfact(double f){
double result=0,precision=1000;
int i=0;
if(f>0){
while(precision>1E-309){
while(f>fact(result+precision)&&i<10){
result=result+precision;
i++;
}
precision=precision/10;
i=0;
}
}
else{
result=0;
}
return result;
}

If you do not know whether a number M is N! or not, a decent test is to test if it's divisible by all the small primes until the Sterling approximation of that prime is larger than M. Alternatively, if you have a table of factorials but it doesn't go high enough, you can pick the largest factorial in your table and make sure M is divisible by that.

In C from my app Advanced Trigonometry Calculator v1.6.8
double arcfact(double f) {
double i=1,result=f;
while((result/(i+1))>=1) {
result=result/i;
i++;
}
return result;
}
What you think about that? Works correctly for factorials integers.

Simply divide by positive numbers, i.e: 5!=120 ->> 120/2 = 60 || 60/3 = 20 || 20/4 = 5 || 5/5 = 1
So the last number before result = 1 is your number.
In code you could do the following:
number = res
for x=2;res==x;x++{
res = res/x
}
or something like that. This calculation needs improvement for non-exact numbers.

Most numbers are not in the range of outputs of the factorial function. If that is what you want to test, it's easy to get an approximation using Stirling's formula or the number of digits of the target number, as others have mentioned, then perform a binary search to determine factorials above and below the given number.
What is more interesting is constructing the inverse of the Gamma function, which extends the factorial function to positive real numbers (and to most complex numbers, too). It turns out construction of an inverse is a difficult problem. However, it was solved explicitly for most positive real numbers in 2012 in the following paper: http://www.ams.org/journals/proc/2012-140-04/S0002-9939-2011-11023-2/S0002-9939-2011-11023-2.pdf . The explicit formula is given in Corollary 6 at the end of the paper.
Note that it involves an integral on an infinite domain, but with a careful analysis I believe a reasonable implementation could be constructed. Whether that is better than a simple successive approximation scheme in practice, I don't know.

C/C++ code for what the factorial (r is the resulting factorial):
int wtf(int r) {
int f = 1;
while (r > 1)
r /= ++f;
return f;
}
Sample tests:
Call: wtf(1)
Output: 1
Call: wtf(120)
Output: 5
Call: wtf(3628800)
Output: 10

Based on:
Full inverted factorial valid for x>1
Use the suggested calculation. If factorial is expressible in full binary form the algorithm is:
Suppose input is factorial x, x=n!
Return 1 for 1
Find the number of trailing 0's in binary expansion of the factorial x, let us mark it with t
Calculate x/fact(t), x divided by the factorial of t, mathematically x/(t!)
Find how many times x/fact(t) divides t+1, rounded down to the nearest integer, let us mark it with m
Return m+t
__uint128_t factorial(int n);
int invert_factorial(__uint128_t fact)
{
if (fact == 1) return 1;
int t = __builtin_ffs(fact)-1;
int res = fact/factorial(t);
return t + (int)log(res)/log(t+1);
}
128-bit is giving in on 34!

Related

Example of Big O of 2^n

So I can picture what an algorithm is that has a complexity of n^c, just the number of nested for loops.
for (var i = 0; i < dataset.len; i++ {
for (var j = 0; j < dataset.len; j++) {
//do stuff with i and j
}
}
Log is something that splits the data set in half every time, binary search does this (not entirely sure what code for this looks like).
But what is a simple example of an algorithm that is c^n or more specifically 2^n. Is O(2^n) based on loops through data? Or how data is split? Or something else entirely?

Algorithms with running time O(2^N) are often recursive algorithms that solve a problem of size N by recursively solving two smaller problems of size N-1.
This program, for instance prints out all the moves necessary to solve the famous "Towers of Hanoi" problem for N disks in pseudo-code
void solve_hanoi(int N, string from_peg, string to_peg, string spare_peg)
{
if (N<1) {
return;
}
if (N>1) {
solve_hanoi(N-1, from_peg, spare_peg, to_peg);
}
print "move from " + from_peg + " to " + to_peg;
if (N>1) {
solve_hanoi(N-1, spare_peg, to_peg, from_peg);
}
}
Let T(N) be the time it takes for N disks.
We have:
T(1) = O(1)
and
T(N) = O(1) + 2*T(N-1) when N>1
If you repeatedly expand the last term, you get:
T(N) = 3*O(1) + 4*T(N-2)
T(N) = 7*O(1) + 8*T(N-3)
...
T(N) = (2^(N-1)-1)*O(1) + (2^(N-1))*T(1)
T(N) = (2^N - 1)*O(1)
T(N) = O(2^N)
To actually figure this out, you just have to know that certain patterns in the recurrence relation lead to exponential results. Generally T(N) = ... + C*T(N-1) with C > 1means O(x^N). See:
https://en.wikipedia.org/wiki/Recurrence_relation

Think about e.g. iterating over all possible subsets of a set. This kind of algorithms is used for instance for a generalized knapsack problem.
If you find it hard to understand how iterating over subsets translates to O(2^n), imagine a set of n switches, each of them corresponding to one element of a set. Now, each of the switches can be turned on or off. Think of "on" as being in the subset. Note, how many combinations are possible: 2^n.
If you want to see an example in code, it's usually easier to think about recursion here, but I can't think od any other nice and understable example right now.

Consider that you want to guess the PIN of a smartphone, this PIN is a 4-digit integer number. You know that the maximum number of bits to hold a 4-digit number is 14 bits. So, you will have to guess the value, the 14-bit correct combination let's say, of this PIN out of the 2^14 = 16384 possible values!!
The only way is to brute force. So, for simplicity, consider this simple 2-bit word that you want to guess right, each bit has 2 possible values, 0 or 1. So, all the possibilities are:
00
01
10
11
We know that all possibilities of an n-bit word will be 2^n possible combinations. So, 2^2 is 4 possible combinations as we saw earlier.
The same applies to the 14-bit integer PIN, so guessing the PIN would require you to solve a 2^14 possible outcome puzzle, hence an algorithm of time complexity O(2^n).
So, those types of problems, where combinations of elements in a set S differs, and you will have to try to solve the problem by trying all possible combinations, will have this O(2^n) time complexity. But, the exponentiation base does not have to be 2. In the example above it's of base 2 because each element, each bit, has two possible values which will not be the case in other problems.
Another good example of O(2^n) algorithms is the recursive knapsack. Where you have to try different combinations to maximize the value, where each element in the set, has two possible values, whether we take it or not.
The Edit Distance problem is an O(3^n) time complexity since you have 3 decisions to choose from for each of the n characters string, deletion, insertion, or replace.

int Fibonacci(int number)
{
if (number <= 1) return number;
return Fibonacci(number - 2) + Fibonacci(number - 1);
}
Growth doubles with each additon to the input data set. The growth curve of an O(2N) function is exponential - starting off very shallow, then rising meteorically.
My example of big O(2^n), but much better is this:
public void solve(int n, String start, String auxiliary, String end) {
if (n == 1) {
System.out.println(start + " -> " + end);
} else {
solve(n - 1, start, end, auxiliary);
System.out.println(start + " -> " + end);
solve(n - 1, auxiliary, start, end);
}
In this method program prints all moves to solve "Tower of Hanoi" problem.
Both examples are using recursive to solve problem and had big O(2^n) running time.

c^N = All combinations of n elements from a c sized alphabet.
More specifically 2^N is all numbers representable with N bits.
The common cases are implemented recursively, something like:
vector<int> bits;
int N
void find_solution(int pos) {
if (pos == N) {
check_solution();
return;
}
bits[pos] = 0;
find_solution(pos + 1);
bits[pos] = 1;
find_solution(pos + 1);
}

Here is a code clip that computes value sum of every combination of values in a goods array(and value is a global array variable):
fun boom(idx: Int, pre: Int, include: Boolean) {
if (idx < 0) return
boom(idx - 1, pre + if (include) values[idx] else 0, true)
boom(idx - 1, pre + if (include) values[idx] else 0, false)
println(pre + if (include) values[idx] else 0)
}
As you can see, it's recursive. We can inset loops to get Polynomial complexity, and using recursive to get Exponential complexity.

Here are two simple examples in python with Big O/Landau (2^N):
#fibonacci
def fib(num):
if num==0 or num==1:
return num
else:
return fib(num-1)+fib(num-2)
num=10
for i in range(0,num):
print(fib(i))
#tower of Hanoi
def move(disk , from, to, aux):
if disk >= 1:
# from twoer , auxilart
move(disk-1, from, aux, to)
print ("Move disk", disk, "from rod", from_rod, "to rod", to_rod)
move(disk-1, aux, to, from)
n = 3
move(n, 'A', 'B', 'C')

Assuming that a set is a subset of itself, then there are 2ⁿ possible subsets for a set with n elements.
think of it this way. to make a subset, lets take one element. this element has two possibilities in the subset you're creating: present or absent. the same applies for all the other elements in the set. multiplying all these possibilities, you arrive at 2ⁿ.

run time of this Prime Factor function?

I wrote this prime factorization function, can someone explain the runtime to me? It seems fast to me as it continuously decomposes a number into primes without having to check if the factors are prime and runs from 2 to the number in the worst case.
I know that no functions yet can factor primes in polynomial time. Also, how does the run time relate asymptotically to factoring large primes?
function getPrimeFactors(num) {
var factors = [];
for (var i = 2; i <= num; i++) {
if (num % i === 0) {
num = num / i;
factors.push(i);
i--;
}
}
return factors;
}

In your example, if num is prime then it would take exactly num - 1 steps. This would mean that the algorithm's runtime is O(num) (where O stands for a pessimistic case). But in case of algorithm that operate on numbers things get a little bit more tricky (thanks for noticing thegreatcontini and Chris)! We always describe complexity as a function of input size. In this case the input is a number num and it is represented with log(num) bits. So the input size is of log(num). Because num = 2 ^ (log(num)) then your algorithm is of complexity O(2^k) where k = log(num) - size of your input.
This is what makes this problem hard - input is very, very small and any polynomial from num leads to exponential algorithm ...
On a side note #rici is right, you need to check only up to sqrt(num), thus easily reducing the runtime to O(sqrt(num)) or more correctly O(sqrt(2) ^ k).

Rescale a vector of integers

Assume that I have a vector, V, of positive integers. If the sum of the integers are larger than a positive integer N, I want to rescale the integers in V so that the sum is <= N. The elements in V must remain above zero. The length of V is guaranteed to be <= N.
Is there an algorithm to perform this rescaling in linear time?
This is not homework, BTW :). I need to rescale a map from symbols to symbol frequencies to use range encoding.
Some quick thinking and googling has not given a solution to the problem.
EDIT:
Ok, the question was somewhat unclear. "Rescale" means "normalize". That is, transform the integers in V, for example by multiplying them by a constant, to smaller positive integers so the criterion of sum(V) <= N is fulfilled. The better the ratios between the integers are preserved, the better the compression will be.
The problem is open-ended in that way, the method does not need to find the optimal (in, say, a least squares fit sense) way to preserve the ratios, but a "good" one. Setting the entire vector to 1, as suggested, is not acceptable (unless forced). "Good" enough would for example be finding the smallest divisor (defined below) that fulfills the sum criterion.
The following naive algorithm does not work.
Find the current sum(V), Sv
divisor := int(ceil(Sv/N))
Divide each integer in V by divisor, rounding down, but not to less than 1.
This fails on v = [1,1,1,10] with N = 5.
divisor = ceil(13 / 5) = 3.
V := [1,1,1, max(1, floor(10/3)) = 3]
Sv is now 6 > 5.
In this case, the correct normalization is [1,1,1,2]
One algorithm that would work is to do a binary search for divisor (defined above) until the smallest divisor in [1,N] fulfilling the sum criterion is found. Starting with the ceil(Sv/N) guess. This is however, not linear in number of operations, but proportional to len(V)*log(len(V)).
I am starting to think that it is impossible to do well, in linear time, in the general case. I might resort to some sort of heuristic.

Just divide all the integers by their Greatest Common Divisor. You can find the GCD efficiently with multiple applications of Euclid's Algorithm.
d = 0
for x in xs:
d = gcd(d, x)
xs = [x/d for x in xs]
The positive point is that you always have a small as possible representation this way, without throwing away any precision and without needing to choose a specific N. The downside is that if your frequencies are large coprime numbers you will have no choice but to sacrifice precision (and you didn't specify what should be done in this case).

How about this:
Find the current sum(V), Sv
divisor := int(ceil(Sv/(N - |V| + 1))
Divide each integer in V by divisor, rounding up
On v = [1,1,1,10] with N = 5:
divisor = ceil(13 / 2) = 7.
V := [1,1,1, ceil(10/7)) = 2]

I think you should just rescale the part above 1. So, subtract 1 from all values, and V.length from N. Then rescale normally, then add 1 back. You can even do slightly better if you keep running totals as you go along, instead of choosing just one factor, which will usually waste some "number space". Something like this:
public static void rescale(int[] data, int N) {
int sum = 0;
for (int d : data)
sum += d;
if (sum > N) {
int n = N - data.length;
sum -= data.length;
for (int a = 0; a < data.length; a++) {
int toScale = data[a] - 1;
int scaled = Math.round(toScale * (float) n / sum);
data[a] = scaled + 1;
n -= scaled;
sum -= toScale;
}
}
}

This is a problem of 'range normalization', but it's very easy. Suppose that S is the sum of the elements of the vector, and S>=N, then S=dN, for some d>=1. Therefore d=S/N. So just multiply every element of the vector by N/S (i.e. divide by d). The result is a vector with rescaled components which sum is exactly N. This procedure is clearly linear :)

Better ways to implement a modulo operation (algorithm question)

I've been trying to implement a modular exponentiator recently. I'm writing the code in VHDL, but I'm looking for advice of a more algorithmic nature. The main component of the modular exponentiator is a modular multiplier which I also have to implement myself. I haven't had any problems with the multiplication algorithm- it's just adding and shifting and I've done a good job of figuring out what all of my variables mean so that I can multiply in a pretty reasonable amount of time.
The problem that I'm having is with implementing the modulus operation in the multiplier. I know that performing repeated subtractions will work, but it will also be slow. I found out that I could shift the modulus to effectively subtract large multiples of the modulus but I think there might still be better ways to do this. The algorithm that I'm using works something like this (weird pseudocode follows):
result,modulus : integer (n bits) (previously defined)
shiftcount : integer (initialized to zero)
while( (modulus<result) and (modulus(n-1) != 1) ){
modulus = modulus << 1
shiftcount++
}
for(i=shiftcount;i>=0;i--){
if(modulus<result){result = result-modulus}
if(i!=0){modulus = modulus >> 1}
}
So...is this a good algorithm, or at least a good place to start? Wikipedia doesn't really discuss algorithms for implementing the modulo operation, and whenever I try to search elsewhere I find really interesting but incredibly complicated (and often unrelated) research papers and publications. If there's an obvious way to implement this that I'm not seeing, I'd really appreciate some feedback.

I'm not sure what you're calculating there to be honest. You talk about modulo operation, but usually a modulo operation is between two numbers a and b, and its result is the remainder of dividing a by b. Where is the a and b in your pseudocode...?
Anyway, maybe this'll help: a mod b = a - floor(a / b) * b.
I don't know if this is faster or not, it depends on whether or not you can do division and multiplication faster than a lot of subtractions.
Another way to speed up the subtraction approach is to use binary search. If you want a mod b, you need to subtract b from a until a is smaller than b. So basically you need to find k such that:
a - k*b < b, k is min
One way to find this k is a linear search:
k = 0;
while ( a - k*b >= b )
++k;
return a - k*b;
But you can also binary search it (only ran a few tests but it worked on all of them):
k = 0;
left = 0, right = a
while ( left < right )
{
m = (left + right) / 2;
if ( a - m*b >= b )
left = m + 1;
else
right = m;
}
return a - left*b;
I'm guessing the binary search solution will be the fastest when dealing with big numbers.
If you want to calculate a mod b and only a is a big number (you can store b on a primitive data type), you can do it even faster:
for each digit p of a do
mod = (mod * 10 + p) % b
return mod
This works because we can write a as a_n*10^n + a_(n-1)*10^(n-1) + ... + a_1*10^0 = (((a_n * 10 + a_(n-1)) * 10 + a_(n-2)) * 10 + ...
I think the binary search is what you're looking for though.

There are many ways to do it in O(log n) time for n bits; you can do it with multiplication and you don't have to iterate 1 bit at a time. For example,
a mod b = a - floor((a * r)/2^n) * b
where
r = 2^n / b
is precomputed because typically you're using the same b many times. If not, use the standard superconverging polynomial iteration method for reciprocal (iterate 2x - bx^2 in fixed point).
Choose n according to the range you need the result (for many algorithms like modulo exponentiation it doesn't have to be 0..b).
(Many decades ago I thought I saw a trick to avoid 2 multiplications in a row... Update: I think it's Montgomery Multiplication (see REDC algorithm). I take it back, REDC does the same work as the simpler algorithm above. Not sure why REDC was ever invented... Maybe slightly lower latency due to using the low-order result into the chained multiplication, instead of the higher-order result?)
Of course if you have a lot of memory, you can just precompute all the 2^n mod b partial sums for n = log2(b)..log2(a). Many software implementations do this.

If you're using shift-and-add for the multiplication (which is by no means the fastest way) you can do the modulo operation after each addition step. If the sum is greater than the modulus you then subtract the modulus. If you can predict the overflow, you can do the addition and subtraction at the same time. Doing the modulo at each step will also reduce the overall size of your multiplier (same length as input rather than double).
The shifting of the modulus you're doing is getting you most of the way towards a full division algorithm (modulo is just taking the remainder).
EDIT Here is my implementation in Python:
def mod_mul(a,b,m):
result = 0
a = a % m
b = b % m
while (b>0):
if (b&1)!=0:
result += a
if result >= m: result -= m
a = a << 1
if a>=m: a-= m
b = b>>1
return result
This is just modular multiplication (result = a*b mod m). The modulo operations at the top are not needed, but serve as a reminder that the algorithm assumes a and b are less than m.
Of course for modular exponentiation you'll have an outer loop that does this entire operation at each step doing either squaring or multiplication. But I think you knew that.

For modulo itself, I'm not sure. For modulo as part of the larger modular exponential operation, did you look up Montgomery multiplication as mentioned in the wikipedia page on modular exponentiation? It's been a while since I've looked into this type of algorithm, but from what I recall, it's commonly used in fast modular exponentiation.
edit: for what it's worth, your modulo algorithm seems ok at first glance. You're basically doing division which is a repeated subtraction algorithm.

That test (modulus(n-1) != 1) //a bit test?
-seems redundant combined with (modulus<result).
Designing for hardware implementation i would be conscious of the smaller/greater than tests implying more logic (subtraction) than bitwise operations and branching on zero.
If we can do bitwise tests easily, this could be quick:
m=msb_of(modulus)
while( result>0 )
{
r=msb_of(result) //countdown from prev msb onto result
shift=r-m //countdown from r onto modulus or
//unroll the small subtraction
takeoff=(modulus<<(shift)) //or integrate this into count of shift
result=result-takeoff; //necessary subtraction
if(shift!=0 && result<0)
{ result=result+(takeoff>>1); }
} //endwhile
if(result==0) { return result }
else { return result+takeoff }
(code untested may contain gotchas)
result is repetively decremented by modulus shifted to match at most significant bits.
After each subtraction: result has a ~50/50 chance of loosing more than 1 msb. It also has ~50/50 chance of going negative,
addition of half what was subtracted will always put it into positive again. > it should be put back in positive if shift was not=0
The working loop exits when result is underrun and 'shift' was 0.

Generating shuffled range using a PRNG rather than shuffling

Is there any known algorithm that can generate a shuffled range [0..n) in linear time and constant space (when output produced iteratively), given an arbitrary seed value?
Assume n may be large, e.g. in the many millions, so a requirement to potentially produce every possible permutation is not required, not least because it's infeasible (the seed value space would need to be huge). This is also the reason for a requirement of constant space. (So, I'm specifically not looking for an array-shuffling algorithm, as that requires that the range is stored in an array of length n, and so would use linear space.)
I'm aware of question 162606, but it doesn't present an answer to this particular question - the mappings from permutation indexes to permutations given in that question would require a huge seed value space.
Ideally, it would act like a LCG with a period and range of n, but the art of selecting a and c for an LCG is subtle. Simply satisfying the constraints for a and c in a full period LCG may satisfy my requirements, but I am wondering if there are any better ideas out there.

Based on Jason's answer, I've made a simple straightforward implementation in C#. Find the next largest power of two greater than N. This makes it trivial to generate a and c, since c needs to be relatively prime (meaning it can't be divisible by 2, aka odd), and (a-1) needs to be divisible by 2, and (a-1) needs to be divisible by 4. Statistically, it should take 1-2 congruences to generate the next number (since 2N >= M >= N).
class Program
{
IEnumerable<int> GenerateSequence(int N)
{
Random r = new Random();
int M = NextLargestPowerOfTwo(N);
int c = r.Next(M / 2) * 2 + 1; // make c any odd number between 0 and M
int a = r.Next(M / 4) * 4 + 1; // M = 2^m, so make (a-1) divisible by all prime factors, and 4
int start = r.Next(M);
int x = start;
do
{
x = (a * x + c) % M;
if (x < N)
yield return x;
} while (x != start);
}
int NextLargestPowerOfTwo(int n)
{
n |= (n >> 1);
n |= (n >> 2);
n |= (n >> 4);
n |= (n >> 8);
n |= (n >> 16);
return (n + 1);
}
static void Main(string[] args)
{
Program p = new Program();
foreach (int n in p.GenerateSequence(1000))
{
Console.WriteLine(n);
}
Console.ReadKey();
}
}

Here is a Python implementation of the Linear Congruential Generator from FryGuy's answer. Because I needed to write it anyway and thought it might be useful for others.
import random
import math
def lcg(start, stop):
N = stop - start
# M is the next largest power of 2
M = int(math.pow(2, math.ceil(math.log(N+1, 2))))
# c is any odd number between 0 and M
c = random.randint(0, M/2 - 1) * 2 + 1
# M=2^m, so make (a-1) divisible by all prime factors and 4
a = random.randint(0, M/4 - 1) * 4 + 1
first = random.randint(0, M - 1)
x = first
while True:
x = (a * x + c) % M
if x < N:
yield start + x
if x == first:
break
if __name__ == "__main__":
for x in lcg(100, 200):
print x,

Sounds like you want an algorithm which is guaranteed to produce a cycle from 0 to n-1 without any repeats. There are almost certainly a whole bunch of these depending on your requirements; group theory would be the most helpful branch of mathematics if you want to delve into the theory behind it.
If you want fast and don't care about predictability/security/statistical patterns, an LCG is probably the simplest approach. The wikipedia page you linked to contains this (fairly simple) set of requirements:
The period of a general LCG is at most
m, and for some choices of a much less
than that. The LCG will have a full
period if and only if:
c and m are relatively prime,
a - 1 is divisible by all prime factors of m
a - 1 is a multiple of 4 if m is a multiple of 4
Alternatively, you could choose a period N >= n, where N is the smallest value that has convenient numerical properties, and just discard any values produced between n and N-1. For example, the lowest N = 2k - 1 >= n would let you use linear feedback shift registers (LFSR). Or find your favorite cryptographic algorithm (RSA, AES, DES, whatever) and given a particular key, figure out the space N of numbers it permutes, and for each step apply encryption once.
If n is small but you want the security to be high, that's probably the trickiest case, as any sequence S is likely to have a period N much higher than n, but is also nontrivial to derive a nonrepeating sequence of numbers with a shorter period than N. (e.g. if you could take the output of S mod n and guarantee nonrepeating sequence of numbers, that would give information about S that an attacker might use)

See my article on secure permutations with block ciphers for one way to do it.

Look into Linear Feedback Shift Registers, they can be used for exactly this.
The short way of explaining them is that you start with a seed and then iterate using the formula
x = (x << 1) | f(x)
where f(x) can only return 0 or 1.
If you choose a good function f, x will cycle through all values between 1 and 2^n-1 (where n is some number), in a good, pseudo-random way.
Example functions can be found here, e.g. for 63 values you can use
f(x) = ((x >> 6) & 1) ^ ((x >> 5) & 1)

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio