Limited space average computation - algorithm

Let's say I have N integers, where N can get huge, but each int is guaranteed to be between 0 and some cap M, where M fits easily in a signed 32-bit field.
If I want to compute the average of these N integers, I can't always just sum and divide them all in the same signed 32-bit space - the numerator carries a risk of overflow if N is too large. One solution to this problem is to just use 64-bit fields for the computation, to hold for larger N, but this solution doesn't scale - If M were a large 64-bit integer instead, the same problem would arise.
Does anyone know of an algorithm (preferably O(N)) that can compute the average of a list of positive integers in the same bit-space? Without doing something cheap like using two integers to simulate a larger one.

Supposing you know M initially, you can keep two variables, one is the answer so far divided by M, and the other is the remainder.
For example, in C++:
int ans = 0, remainder = 0;
for (int i=0;i<N;i++) {
remainder += input[i]; // update remainder so far
ans += remainder/N; // move what we can from remainder into ans
remainder%=N; // calculate what's left of remainder
}
At the end of the loop, the answer is found in ans, with a remainder in remainder (if you need a rounding method other than truncation).
This example works where the maximum input number M+N fits in a 32-bit int.
Note that this should work for positive and negative integers, because in C++, the / operator is the division operator, and % is actually a remainder operator (not really a modulo operator).

You can calculate a running average. If you have the average A of N elements, and you add another element E, the new average is (A*N+E)/(N+1). By the distributive property of division over addition, this is equivalent to (A*N)/(N+1) + E/(N+1). But if A*N overflows, you can use the associative property of multiplication and division, you can convert the first term to A*(N/N+1).
So the algorithm is:
n = 0
avg = 0
for each i in list
avg = avg*(n/(n+1)) + i/(n+1)
n = n+1

Related

Bijection on the integers below x

i'm working on image processing, and i'm writing a parallel algorithm that iterates over all the pixels in an image, and changes the surrounding pixels based on it's value. In this algorithm, minor non-deterministic is acceptable, but i'd rather minimize it by only querying distant pixels simultaneously. Could someone give me an algorithm that bijectively maps the integers below n to the integers below n, in a fast and simple manner, such that two integers that are close to each other before mapping are likely to be far apart after application.
For simplicity let's say n is a power of two. Could you simply reverse the order of the least significant log2(n) bits of the number?
Considering the pixels to be a one dimentional array you could use a hash function j = i*p % n where n is the zero based index of the last pixel and p is a prime number chosen to place the pixel far enough away at each step. % is the remainder operator in C, mathematically I'd write j(i) = i p (mod n).
So if you want to jump at least 10 rows at each iteration, choose p > 10 * w where w is the screen width. You'll want to have a lookup table for p as a function of n and w of course.
Note that j hits every pixel as i goes from 0 to n.
CORRECTION: Use (mod (n + 1)), not (mod n). The last index is n, which cannot be reached using mod n since n (mod n) == 0.
Apart from reverting the bit order, you can use modulo. Say N is a prime number (like 521), so for all x = 0..520 you define a function:
f(x) = x * fac mod N
which is bijection on 0..520. fac is arbitrary number different from 0 and 1. For example for N = 521 and fac = 122 you get the following mapping:
which as you can see is quite uniform and not many numbers are near the diagonal - there are some, but it is a small proportion.

Rescale a vector of integers

Assume that I have a vector, V, of positive integers. If the sum of the integers are larger than a positive integer N, I want to rescale the integers in V so that the sum is <= N. The elements in V must remain above zero. The length of V is guaranteed to be <= N.
Is there an algorithm to perform this rescaling in linear time?
This is not homework, BTW :). I need to rescale a map from symbols to symbol frequencies to use range encoding.
Some quick thinking and googling has not given a solution to the problem.
EDIT:
Ok, the question was somewhat unclear. "Rescale" means "normalize". That is, transform the integers in V, for example by multiplying them by a constant, to smaller positive integers so the criterion of sum(V) <= N is fulfilled. The better the ratios between the integers are preserved, the better the compression will be.
The problem is open-ended in that way, the method does not need to find the optimal (in, say, a least squares fit sense) way to preserve the ratios, but a "good" one. Setting the entire vector to 1, as suggested, is not acceptable (unless forced). "Good" enough would for example be finding the smallest divisor (defined below) that fulfills the sum criterion.
The following naive algorithm does not work.
Find the current sum(V), Sv
divisor := int(ceil(Sv/N))
Divide each integer in V by divisor, rounding down, but not to less than 1.
This fails on v = [1,1,1,10] with N = 5.
divisor = ceil(13 / 5) = 3.
V := [1,1,1, max(1, floor(10/3)) = 3]
Sv is now 6 > 5.
In this case, the correct normalization is [1,1,1,2]
One algorithm that would work is to do a binary search for divisor (defined above) until the smallest divisor in [1,N] fulfilling the sum criterion is found. Starting with the ceil(Sv/N) guess. This is however, not linear in number of operations, but proportional to len(V)*log(len(V)).
I am starting to think that it is impossible to do well, in linear time, in the general case. I might resort to some sort of heuristic.
Just divide all the integers by their Greatest Common Divisor. You can find the GCD efficiently with multiple applications of Euclid's Algorithm.
d = 0
for x in xs:
d = gcd(d, x)
xs = [x/d for x in xs]
The positive point is that you always have a small as possible representation this way, without throwing away any precision and without needing to choose a specific N. The downside is that if your frequencies are large coprime numbers you will have no choice but to sacrifice precision (and you didn't specify what should be done in this case).
How about this:
Find the current sum(V), Sv
divisor := int(ceil(Sv/(N - |V| + 1))
Divide each integer in V by divisor, rounding up
On v = [1,1,1,10] with N = 5:
divisor = ceil(13 / 2) = 7.
V := [1,1,1, ceil(10/7)) = 2]
I think you should just rescale the part above 1. So, subtract 1 from all values, and V.length from N. Then rescale normally, then add 1 back. You can even do slightly better if you keep running totals as you go along, instead of choosing just one factor, which will usually waste some "number space". Something like this:
public static void rescale(int[] data, int N) {
int sum = 0;
for (int d : data)
sum += d;
if (sum > N) {
int n = N - data.length;
sum -= data.length;
for (int a = 0; a < data.length; a++) {
int toScale = data[a] - 1;
int scaled = Math.round(toScale * (float) n / sum);
data[a] = scaled + 1;
n -= scaled;
sum -= toScale;
}
}
}
This is a problem of 'range normalization', but it's very easy. Suppose that S is the sum of the elements of the vector, and S>=N, then S=dN, for some d>=1. Therefore d=S/N. So just multiply every element of the vector by N/S (i.e. divide by d). The result is a vector with rescaled components which sum is exactly N. This procedure is clearly linear :)

Better ways to implement a modulo operation (algorithm question)

I've been trying to implement a modular exponentiator recently. I'm writing the code in VHDL, but I'm looking for advice of a more algorithmic nature. The main component of the modular exponentiator is a modular multiplier which I also have to implement myself. I haven't had any problems with the multiplication algorithm- it's just adding and shifting and I've done a good job of figuring out what all of my variables mean so that I can multiply in a pretty reasonable amount of time.
The problem that I'm having is with implementing the modulus operation in the multiplier. I know that performing repeated subtractions will work, but it will also be slow. I found out that I could shift the modulus to effectively subtract large multiples of the modulus but I think there might still be better ways to do this. The algorithm that I'm using works something like this (weird pseudocode follows):
result,modulus : integer (n bits) (previously defined)
shiftcount : integer (initialized to zero)
while( (modulus<result) and (modulus(n-1) != 1) ){
modulus = modulus << 1
shiftcount++
}
for(i=shiftcount;i>=0;i--){
if(modulus<result){result = result-modulus}
if(i!=0){modulus = modulus >> 1}
}
So...is this a good algorithm, or at least a good place to start? Wikipedia doesn't really discuss algorithms for implementing the modulo operation, and whenever I try to search elsewhere I find really interesting but incredibly complicated (and often unrelated) research papers and publications. If there's an obvious way to implement this that I'm not seeing, I'd really appreciate some feedback.
I'm not sure what you're calculating there to be honest. You talk about modulo operation, but usually a modulo operation is between two numbers a and b, and its result is the remainder of dividing a by b. Where is the a and b in your pseudocode...?
Anyway, maybe this'll help: a mod b = a - floor(a / b) * b.
I don't know if this is faster or not, it depends on whether or not you can do division and multiplication faster than a lot of subtractions.
Another way to speed up the subtraction approach is to use binary search. If you want a mod b, you need to subtract b from a until a is smaller than b. So basically you need to find k such that:
a - k*b < b, k is min
One way to find this k is a linear search:
k = 0;
while ( a - k*b >= b )
++k;
return a - k*b;
But you can also binary search it (only ran a few tests but it worked on all of them):
k = 0;
left = 0, right = a
while ( left < right )
{
m = (left + right) / 2;
if ( a - m*b >= b )
left = m + 1;
else
right = m;
}
return a - left*b;
I'm guessing the binary search solution will be the fastest when dealing with big numbers.
If you want to calculate a mod b and only a is a big number (you can store b on a primitive data type), you can do it even faster:
for each digit p of a do
mod = (mod * 10 + p) % b
return mod
This works because we can write a as a_n*10^n + a_(n-1)*10^(n-1) + ... + a_1*10^0 = (((a_n * 10 + a_(n-1)) * 10 + a_(n-2)) * 10 + ...
I think the binary search is what you're looking for though.
There are many ways to do it in O(log n) time for n bits; you can do it with multiplication and you don't have to iterate 1 bit at a time. For example,
a mod b = a - floor((a * r)/2^n) * b
where
r = 2^n / b
is precomputed because typically you're using the same b many times. If not, use the standard superconverging polynomial iteration method for reciprocal (iterate 2x - bx^2 in fixed point).
Choose n according to the range you need the result (for many algorithms like modulo exponentiation it doesn't have to be 0..b).
(Many decades ago I thought I saw a trick to avoid 2 multiplications in a row... Update: I think it's Montgomery Multiplication (see REDC algorithm). I take it back, REDC does the same work as the simpler algorithm above. Not sure why REDC was ever invented... Maybe slightly lower latency due to using the low-order result into the chained multiplication, instead of the higher-order result?)
Of course if you have a lot of memory, you can just precompute all the 2^n mod b partial sums for n = log2(b)..log2(a). Many software implementations do this.
If you're using shift-and-add for the multiplication (which is by no means the fastest way) you can do the modulo operation after each addition step. If the sum is greater than the modulus you then subtract the modulus. If you can predict the overflow, you can do the addition and subtraction at the same time. Doing the modulo at each step will also reduce the overall size of your multiplier (same length as input rather than double).
The shifting of the modulus you're doing is getting you most of the way towards a full division algorithm (modulo is just taking the remainder).
EDIT Here is my implementation in Python:
def mod_mul(a,b,m):
result = 0
a = a % m
b = b % m
while (b>0):
if (b&1)!=0:
result += a
if result >= m: result -= m
a = a << 1
if a>=m: a-= m
b = b>>1
return result
This is just modular multiplication (result = a*b mod m). The modulo operations at the top are not needed, but serve as a reminder that the algorithm assumes a and b are less than m.
Of course for modular exponentiation you'll have an outer loop that does this entire operation at each step doing either squaring or multiplication. But I think you knew that.
For modulo itself, I'm not sure. For modulo as part of the larger modular exponential operation, did you look up Montgomery multiplication as mentioned in the wikipedia page on modular exponentiation? It's been a while since I've looked into this type of algorithm, but from what I recall, it's commonly used in fast modular exponentiation.
edit: for what it's worth, your modulo algorithm seems ok at first glance. You're basically doing division which is a repeated subtraction algorithm.
That test (modulus(n-1) != 1) //a bit test?
-seems redundant combined with (modulus<result).
Designing for hardware implementation i would be conscious of the smaller/greater than tests implying more logic (subtraction) than bitwise operations and branching on zero.
If we can do bitwise tests easily, this could be quick:
m=msb_of(modulus)
while( result>0 )
{
r=msb_of(result) //countdown from prev msb onto result
shift=r-m //countdown from r onto modulus or
//unroll the small subtraction
takeoff=(modulus<<(shift)) //or integrate this into count of shift
result=result-takeoff; //necessary subtraction
if(shift!=0 && result<0)
{ result=result+(takeoff>>1); }
} //endwhile
if(result==0) { return result }
else { return result+takeoff }
(code untested may contain gotchas)
result is repetively decremented by modulus shifted to match at most significant bits.
After each subtraction: result has a ~50/50 chance of loosing more than 1 msb. It also has ~50/50 chance of going negative,
addition of half what was subtracted will always put it into positive again. > it should be put back in positive if shift was not=0
The working loop exits when result is underrun and 'shift' was 0.

Reverse factorial

Well, we all know that if N is given it's easy to calculate N!. But what about the inverse?
N! is given and you are about to find N - Is that possible ? I'm curious.
Set X=1.
Generate F=X!
Is F = the input? If yes, then X is N.
If not, then set X=X+1, then start again at #2.
You can optimize by using the previous result of F to compute the new F (new F = new X * old F).
It's just as fast as going the opposite direction, if not faster, given that division generally takes longer than multiplication. A given factorial A! is guaranteed to have all integers less than A as factors in addition to A, so you'd spend just as much time factoring those out as you would just computing a running factorial.
If you have Q=N! in binary, count the trailing zeros. Call this number J.
If N is 2K or 2K+1, then J is equal to 2K minus the number of 1's in the binary representation of 2K, so add 1 over and over until the number of 1's you have added is equal to the number of 1's in the result.
Now you know 2K, and N is either 2K or 2K+1. To tell which one it is, count the factors of the biggest prime (or any prime, really) in 2K+1, and use that to test Q=(2K+1)!.
For example, suppose Q (in binary) is
1111001110111010100100110000101011001111100000110110000000000000000000
(Sorry it's so small, but I don't have tools handy to manipulate larger numbers.)
There are 19 trailing zeros, which is
10011
Now increment:
1: 10100
2: 10101
3: 10110 bingo!
So N is 22 or 23. I need a prime factor of 23, and, well, I have to pick 23 (it happens that 2K+1 is prime, but I didn't plan that and it isn't needed). So 23^1 should divide 23!, it doesn't divide Q, so
N=22
int inverse_factorial(int factorial){
int current = 1;
while (factorial > current) {
if (factorial % current) {
return -1; //not divisible
}
factorial /= current;
++current;
}
if (current == factorial) {
return current;
}
return -1;
}
Yes. Let's call your input x. For small values of x, you can just try all values of n and see if n! = x. For larger x, you can binary-search over n to find the right n (if one exists). Note hat we have n! ≈ e^(n ln n - n) (this is Stirling's approximation), so you know approximately where to look.
The problem of course, is that very few numbers are factorials; so your question makes sense for only a small set of inputs. If your input is small (e.g. fits in a 32-bit or 64-bit integer) a lookup table would be the best solution.
(You could of course consider the more general problem of inverting the Gamma function. Again, binary search would probably be the best way, rather than something analytic. I'd be glad to be shown wrong here.)
Edit: Actually, in the case where you don't know for sure that x is a factorial number, you may not gain all that much (or anything) with binary search using Stirling's approximation or the Gamma function, over simple solutions. The inverse factorial grows slower than logarithmic (this is because the factorial is superexponential), and you have to do arbitrary-precision arithmetic to find factorials and multiply those numbers anyway.
For instance, see Draco Ater's answer for an idea that (when extended to arbitrary-precision arithmetic) will work for all x. Even simpler, and probably even faster because multiplication is faster than division, is Dav's answer which is the most natural algorithm... this problem is another triumph of simplicity, it appears. :-)
Well, if you know that M is really the factorial of some integer, then you can use
n! = Gamma(n+1) = sqrt(2*PI) * exp(-n) * n^(n+1/2) + O(n^(-1/2))
You can solve this (or, really, solve ln(n!) = ln Gamma(n+1)) and find the nearest integer.
It is still nonlinear, but you can get an approximate solution by iteration easily (in fact, I expect the n^(n+1/2) factor is enough).
Multiple ways. Use lookup tables, use binary search, use a linear search...
Lookup tables is an obvious one:
for (i = 0; i < MAX; ++i)
Lookup[i!] = i; // you can calculate i! incrementally in O(1)
You could implement this using hash tables for example, or if you use C++/C#/Java, they have their own hash table-like containers.
This is useful if you have to do this a lot of times and each time it has to be fast, but you can afford to spend some time building this table.
Binary search: assume the number is m = (1 + N!) / 2. Is m! larger than N!? If yes, reduce the search between 1 and m!, otherwise reduce it between m! + 1 and N!. Recursively apply this logic.
Of course, these numbers might be very big and you might end up doing a lot of unwanted operations. A better idea is to search between 1 and sqrt(N!) using binary search, or try to find even better approximations, though this might not be easy. Consider studying the gamma function.
Linear search: Probably the best in this case. Calculate 1*2*3*...*k until the product is equal to N! and output k.
If the input number is really N!, its fairly simple to calculate N.
A naive approach computing factorials will be too slow, due to the overhead of big integer arithmetic. Instead we can notice that, when N ≥ 7, each factorial can be uniquely identified by its length (i.e. number of digits).
The length of an integer x can be computed as log10(x) + 1.
Product rule of logarithms: log(a*b) = log(a) + log(b)
By using above two facts, we can say that length of N! is:
which can be computed by simply adding log10(i) until we get length of our input number, since log(1*2*3*...*n) = log(1) + log(2) + log(3) + ... + log(n).
This C++ code should do the trick:
double result = 0;
for (int i = 1; i <= 1000000; ++i) { // This should work for 1000000! (where inputNumber has 10^7 digits)
result += log10(i);
if ( (int)result + 1 == inputNumber.size() ) { // assuming inputNumber is a string of N!
std::cout << i << endl;
break;
}
}
(remember to check for cases where n<7 (basic factorial calculation should be fine here))
Complete code: https://pastebin.com/9EVP7uJM
Here is some clojure code:
(defn- reverse-fact-help [n div]
(cond (not (= 0 (rem n div))) nil
(= 1 (quot n div)) div
:else (reverse-fact-help (/ n div) (+ div 1))))
(defn reverse-fact [n] (reverse-fact-help n 2))
Suppose n=120, div=2. 120/2=60, 60/3=20, 20/4=5, 5/5=1, return 5
Suppose n=12, div=2. 12/2=6, 6/3=2, 2/4=.5, return 'nil'
int p = 1,i;
//assume variable fact_n has the value n!
for(i = 2; p <= fact_n; i++) p = p*i;
//i is the number you are looking for if p == fact_n else fact_n is not a factorial
I know it isn't a pseudocode, but it's pretty easy to understand
inverse_factorial( X )
{
X_LOCAL = X;
ANSWER = 1;
while(1){
if(X_LOCAL / ANSWER == 1)
return ANSWER;
X_LOCAL = X_LOCAL / ANSWER;
ANSWER = ANSWER + 1;
}
}
This function is based on successive approximations! I created it and implemented it in Advanced Trigonometry Calculator 1.7.0
double arcfact(double f){
double result=0,precision=1000;
int i=0;
if(f>0){
while(precision>1E-309){
while(f>fact(result+precision)&&i<10){
result=result+precision;
i++;
}
precision=precision/10;
i=0;
}
}
else{
result=0;
}
return result;
}
If you do not know whether a number M is N! or not, a decent test is to test if it's divisible by all the small primes until the Sterling approximation of that prime is larger than M. Alternatively, if you have a table of factorials but it doesn't go high enough, you can pick the largest factorial in your table and make sure M is divisible by that.
In C from my app Advanced Trigonometry Calculator v1.6.8
double arcfact(double f) {
double i=1,result=f;
while((result/(i+1))>=1) {
result=result/i;
i++;
}
return result;
}
What you think about that? Works correctly for factorials integers.
Simply divide by positive numbers, i.e: 5!=120 ->> 120/2 = 60 || 60/3 = 20 || 20/4 = 5 || 5/5 = 1
So the last number before result = 1 is your number.
In code you could do the following:
number = res
for x=2;res==x;x++{
res = res/x
}
or something like that. This calculation needs improvement for non-exact numbers.
Most numbers are not in the range of outputs of the factorial function. If that is what you want to test, it's easy to get an approximation using Stirling's formula or the number of digits of the target number, as others have mentioned, then perform a binary search to determine factorials above and below the given number.
What is more interesting is constructing the inverse of the Gamma function, which extends the factorial function to positive real numbers (and to most complex numbers, too). It turns out construction of an inverse is a difficult problem. However, it was solved explicitly for most positive real numbers in 2012 in the following paper: http://www.ams.org/journals/proc/2012-140-04/S0002-9939-2011-11023-2/S0002-9939-2011-11023-2.pdf . The explicit formula is given in Corollary 6 at the end of the paper.
Note that it involves an integral on an infinite domain, but with a careful analysis I believe a reasonable implementation could be constructed. Whether that is better than a simple successive approximation scheme in practice, I don't know.
C/C++ code for what the factorial (r is the resulting factorial):
int wtf(int r) {
int f = 1;
while (r > 1)
r /= ++f;
return f;
}
Sample tests:
Call: wtf(1)
Output: 1
Call: wtf(120)
Output: 5
Call: wtf(3628800)
Output: 10
Based on:
Full inverted factorial valid for x>1
Use the suggested calculation. If factorial is expressible in full binary form the algorithm is:
Suppose input is factorial x, x=n!
Return 1 for 1
Find the number of trailing 0's in binary expansion of the factorial x, let us mark it with t
Calculate x/fact(t), x divided by the factorial of t, mathematically x/(t!)
Find how many times x/fact(t) divides t+1, rounded down to the nearest integer, let us mark it with m
Return m+t
__uint128_t factorial(int n);
int invert_factorial(__uint128_t fact)
{
if (fact == 1) return 1;
int t = __builtin_ffs(fact)-1;
int res = fact/factorial(t);
return t + (int)log(res)/log(t+1);
}
128-bit is giving in on 34!

Random number in range 0 to n

Given a function R which produces true random 32 bit numbers, I would like a function that returns random integers in the range 0 to n, where n is arbitrary (less than 2^32).
The function must produce all values 0 to n with equal probability.
I would like a function that executes in constant time with no if statements or loops, so something like the Java Random.nextInt(n) function is out.
I suspect that a simple modulus will not do the job unless n is a power of 2 -- am I right?
I have accepted Jason's answer, despite it requiring a loop of undetermined duration, since it appears to be the best method to use in practice and essentially answers my question. However I am still interested in any algorithms (even if less efficient) which would be deterministic in nature and be guaranteed to terminate, such as Mark Byers has pointed to.
Without discarding some of the values from the source, you can not do this. For example, a set of size 2^32 can not be partitioned into three equally sized sets. Therefore, it is impossible to do this without discarding some of the values and iterating until a non-discarded value is produced.
So, just use this (pseudocode):
rng is random number generator produces uniform integers from [0, max)
compute m = max modulo (n + 1)
do {
draw a random number r from rng
} while(r >= max - m)
return r modulo (n + 1)
Effectively I am throwing out the top part of the distribution that causes problems. If rng is uniform on [0, max), then this algorithm will be uniform on [0, n]
What you're asking for is impossible. You can't partition 2**32 numbers into three sets of exactly equal size.
If you want to guarantee an absolutely perfect uniform distribution in 0 <= x < n, where n is not a power of 2 then you have to be prepared to call R potentially an infinite number of times. In reality you will typically need only one or two calls, but the code has to in theory be able call R any number of times otherwise it can't be completely uniform.
I don't understand why modulus wouldn't do what you want? Since R is a function that produces true random 32 bit numbers, that means that each number has the same probability to be produced, right? So, if you use a modulus n:
randomNumber = R() % (n + 1) //EDITED: n+1 to return values from 0-n
then each number from 0 to n has the same probability!
You can generate two 32 bit numbers and put them together to form 64 bit number. Worst case scenario can be than biased by 0.99999999976716936 if you do not discharge numbers (if you need number whit no more than 32 bits) that mean that some number have by this factor lower probability than other.
But if you still want to remove this small bias you will have low ration "out of range" hits and in that case more that 1 discharge.
Depending upon your problem/use of the random numbers, maybe you could pre-allocate your random numbers using a slow method and put them into a simple array.
Then getNextRnd() can just return the next in the array.
Quick, fixed time call, no branches, just wasting memory (which is usually pretty cheap) and process initialization time.

Resources