I am new to FFTs so I am slightly confused on some concepts. So far the FFT examples I've seen for equation multiplication involve equations with consecutive exponents (i.e. A(x) = 1 + 3x + 5x^2 +... and B(x) = 4 + 6x + 9x^2 + ... and C(x) = A(x)*B(x)). However, it is possible to use FFT on two equations that do not have equal exponents? For example, is it possible to use FFT to multiply:
A(x) = 1 + 3x^2 + 9x^8
and
B(x) = 5x + 6 x^3 + 10x^8
in O(nlogn) time?
If not, are there any cases where the runtime will be O(nlogn)? For example, if the number of terms in the product is O(n) instead of O(n^2)?
Even if the runtime is more than O(nlogn), how can we use FFT to minimize the runtime?
yes it is possible to use DFFT on non equal exponent polynomials...
the missing exponents are just multiplied by 0 which is also a number... just rewrite your polynomials:
A(x) = 1 + 3x^2 + 9x^8
B(x) = 5x + 6x^3 + 10x^8
to something like this:
A(x) = 1x^0 + 0x^1 + 3x^2 + 0x^3 + 0x^4+ 0x^5+ 0x^6+ 0x^7 + 9x^8
B(x) = 0x^0 + 5x^1 + 0x^2 + 6x^3 + 0x^4+ 0x^5+ 0x^6+ 0x^7 + 10x^8
so your vectors for DFFT are:
A = (1,0,3,0,0,0,0,0, 9)
B = (0,5,0,6,0,0,0,0,10)
add zero's so the vector is the correct result size (max A exponent +1 + max B exponent +1) and also round up to closest power of 2 for DFFT usage so original sizes are 9,9 -> 9+9 -> 18 -> round up -> 32
A = (1,0,3,0,0,0,0,0, 9,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0)
B = (0,5,0,6,0,0,0,0,10,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0)
// | original | correct result | nearest power of 2 |
and do the DFFT stuff you want ... I assume you want to do something like this:
A' = DFFT(A)
B' = DFFT(B)
C(i)' = A'(i) * B'(i) // i=0..n-1
C= IDFFT(C')
which is O(n*log(n)). Do not forget that if you use DFFT (not DFT) n = 32 and not 18 !!! because n must be power of 2 for fast algorithm of DFT also if you want performance improvements than look at the DFFT weight matrices for DFFT(A),DFFT(B) they are the same so no need to compute them twice ...
Related
My problem is limited to unsigned integers of 256 bits.
I have a value x, and I need to descale it by the ratio n / d, where n < d.
The simple solution is of course x * n / d, but the problem is that x * n may overflow.
I am looking for any arithmetic trick which may help in reaching a result as accurate as possible.
Dividing each of n and d by gcd(n, d) before calculating x * n / d does not guarantee success.
Is there any process (iterative or other) which i can use in order to solve this problem?
Note that I am willing to settle on an inaccurate solution, but I'd need to be able to estimate the error.
NOTE: Using integer division instead of normal division
Let us suppose
x = ad + b
n = cd + e
Then find a,b,c,e as follows:
a = x/d
b = x%d
c = n/d
e = n%d
Then,
nx/d = acd + ae + bc + be/d
CALCULATING be/d
1. Represent e in binary form
2. Find b/d, 2b/d, 4b/d, 8b/d, ... 256b/d and their remainders
3. Find be/d = b*binary terms + their remainders
Example:
e = 101 in binary = 4+1
be/d = (b/d + 4b/d) + (b%d + 4b%d)/d
FINDING b/d, 2b/d, ... 256b/d
quotient(2*ib/d) = 2*quotient(ib /d) + (2*remainder(ib /d))/d
remainder(2*ib/d) = (2*remainder(ib/d))%d
Executes in O(number of bits)
I have given a Set A I have to find the sum of Fibonacci Sum of All the Subsets of A.
Fibonacci(X) - Is the Xth Element of Fibonacci Series
For example, for A = {1,2,3}:
Fibonacci(1) + Fibonacci(2) + Fibonacci(3) + Fibonacci(1+2) + Fibonacci(2+3) + Fibonacci(1+3) + Fibonacci(1+2+3)
1 + 1 + 2 + 2 + 5 + 3 + 8 = 22
Is there any way I can find the sum without generating the subset?
Since I find the Sum of all subset easily
i.e. Sum of All Subset - (1+2+3)*(pow(2,length of set-1))
There surely is.
First, let's recall that the nth Fibonacci number equals
φ(n) = [φ^n - (-φ)^(-n)]/√5
where φ = (√5 + 1)/2 (Golden Ratio) and (-φ)^(-1) = (1-√5)/2. But to make this shorter, let me denote φ as A and (-φ)^(-1) as B.
Next, let's notice that a sum of Fibonacci numbers is a sum of powers of A and B:
[φ(n) + φ(m)]*√5 = A^n + A^m - B^n - B^m
Now what is enough to calc (in the {1,2,3} example) is
A^1 + A^2 + A^3 + A^{1+2} + A^{1+3} + A^{2+3} + A^{1+2+3}.
But hey, there's a simpler expression for this:
(A^1 + 1)(A^2 + 1)(A^3 + 1) - 1
Now, it is time to get the whole result.
Let our set be {n1, n2, ..., nk}. Then our sum will be equal to
Sum = 1/√5 * [(A^n1 + 1)(A^n2 + 1)...(A^nk + 1) - (B^n1 + 1)(B^n2 + 1)...(B^nk + 1)]
I think, mathematically, this is the "simplest" form of the answer as there's no relation between n_i. However, there could be some room for computative optimization of this expression. In fact, I'm not sure at all if this (using real numbers) will work faster than the "straightforward" summing, but the question was about avoiding subsets generation, so here's the answer.
I tested the answer from YakovL using Python 2.7. It works very well and is plenty quick. I cannot imagine that summing the sequence values would be quicker. Here's the implementation.
_phi = (5.**0.5 + 1.)/2.
A = lambda n: _phi**n
B = lambda n: (-_phi)**(-n)
prod = lambda it: reduce(lambda x, y: x*y, it)
subset_sum = lambda s: (prod(A(n)+1 for n in s) - prod(B(n)+1 for n in s))/5**0.5
And here are some test results:
print subset_sum({1, 2, 3})
# 22.0
# [Finished in 0.1s]
print subset_sum({1, 2, 4, 8, 16, 32, 64, 128, 256, 512})
# 7.29199318438e+213
# [Finished in 0.1s]
I have a data set of the form:
[9.1 5.6 7.4] => 8.5, [4.1 4.4 5.2] => 4.9, ... , x => y(x)
So x is a real vector of three elements and y is a scalar function.
I'm assuming a weighted average model of this data:
y(x) = (a * x[0] + b * x[1] + c * x[2]) / (a+b+c) + E(x)
where E is an unknown random error term.
I need an algorithm to find a,b,c, that minimizes total sum square error:
error = sum over all x of { E(x)^2 }
for a given data set.
Assume that the weights are normalized to sum to 1 (which happily is without loss of generality), then we can re-cast the problem with c = 1 - a - b, so we are actually solving for a and b.
With this we can write
error(a,b) = sum over all x { a x[0] + b x[1] + (1 - a - b) x[2] - y(x) }^2
Now it's just a question of taking the partial derivatives d_error/da and d_error/db and setting them to zero to find the minimum.
With some fiddling, you get a system of two equations in a and b.
C(X[0],X[0],X[2]) a + C(X[0],X[1],X[2]) b = C(X[0],Y,X[2])
C(X[1],X[0],X[2]) a + C(X[1],X[1],X[2]) b = C(X[1],Y,X[2])
The meaning of X[i] is the vector of all i'th components from the dataset x values.
The meaning of Y is the vector of all y(x) values.
The coefficient function C has the following meaning:
C(p, q, r) = sum over i { p[i] ( q[i] - r[i] ) }
I'll omit how to solve the 2x2 system unless this is a problem.
If we plug in the two-element data set you gave, we should get precise coefficients because you can always approximate two points perfectly with a line. So for example the first equation coefficients are:
C(X[0],X[0],X[2]) = 9.1(9.1 - 7.4) + 4.1(4.1 - 5.2) = 10.96
C(X[0],X[1],X[2]) = -19.66
C(X[0],Y,X[2]) = 8.78
Similarly for the second equation: 4.68 -13.6 4.84
Solving the 2x2 system produces: a = 0.42515, b = -0.20958. Therefore c = 0.78443.
Note that in this problem a negative coefficient results. There is nothing to guarantee they'll be positive, though "real" data sets may produce this result.
Indeed if you compute weighted averages with these coefficients, they are 8.5 and 4.9.
For fun I also tried this data set:
X[0] X[1] X[2] Y
0.018056028 9.70442075 9.368093544 6.360312244
8.138752835 5.181373099 3.824747424 5.423581239
6.296398214 4.74405298 9.837741509 7.714662742
5.177385358 1.241610571 5.028388255 4.491743107
4.251033792 8.261317658 7.415111851 6.430957844
4.720645386 1.0721718 2.187147908 2.815078796
1.941872069 1.108191586 6.24591771 3.994268819
4.220448549 9.931055481 4.435085917 5.233711923
9.398867623 2.799376317 7.982096264 7.612485261
4.971020963 1.578519218 0.462459906 2.248086465
I generated the Y values with 1/3 x[0] + 1/6 x[1] + 1/2 x[2] + E where E is a random number in [-0.1..+0.1]. If the algorithm is working correctly we'd expect to get roughly a = 1/3 and b = 1/6 from this result. Indeed we get a = .3472 and b = .1845.
OP has now said that his actual data are larger than 3-vectors. This method generalizes without much trouble. If the vectors are of length n, then you get an n-1 x n-1 system to solve.
How do I determine the square root of a floating point number? Is the Newton-Raphson method a good way? I have no hardware square root either. I also have no hardware divide (but I have implemented floating point divide).
If possible, I would prefer to reduce the number of divides as much as possible since they are so expensive.
Also, what should be the initial guess to reduce the total number of iterations???
Thank you so much!
When you use Newton-Raphson to compute a square-root, you actually want to use the iteration to find the reciprocal square root (after which you can simply multiply by the input--with some care for rounding--to produce the square root).
More precisely: we use the function f(x) = x^-2 - n. Clearly, if f(x) = 0, then x = 1/sqrt(n). This gives rise to the newton iteration:
x_(i+1) = x_i - f(x_i)/f'(x_i)
= x_i - (x_i^-2 - n)/(-2x_i^-3)
= x_i + (x_i - nx_i^3)/2
= x_i*(3/2 - 1/2 nx_i^2)
Note that (unlike the iteration for the square root), this iteration for the reciprocal square root involves no divisions, so it is generally much more efficient.
I mentioned in your question on divide that you should look at existing soft-float libraries, rather than re-inventing the wheel. That advice applies here as well. This function has already been implemented in existing soft-float libraries.
Edit: the questioner seems to still be confused, so let's work an example: sqrt(612). 612 is 1.1953125 x 2^9 (or b1.0011001 x 2^9, if you prefer binary). Pull out the even portion of the exponent (9) to write the input as f * 2^(2m), where m is an integer and f is in the range [1,4). Then we will have:
sqrt(n) = sqrt(f * 2^2m) = sqrt(f)*2^m
applying this reduction to our example gives f = 1.1953125 * 2 = 2.390625 (b10.011001) and m = 4. Now do a newton-raphson iteration to find x = 1/sqrt(f), using a starting guess of 0.5 (as I noted in a comment, this guess converges for all f, but you can do significantly better using a linear approximation as an initial guess):
x_0 = 0.5
x_1 = x_0*(3/2 - 1/2 * 2.390625 * x_0^2)
= 0.6005859...
x_2 = x_1*(3/2 - 1/2 * 2.390625 * x_1^2)
= 0.6419342...
x_3 = 0.6467077...
x_4 = 0.6467616...
So even with a (relatively bad) initial guess, we get rapid convergence to the true value of 1/sqrt(f) = 0.6467616600226026.
Now we simply assemble the final result:
sqrt(f) = x_n * f = 1.5461646...
sqrt(n) = sqrt(f) * 2^m = 24.738633...
And check: sqrt(612) = 24.738633...
Obviously, if you want correct rounding, careful analysis needed to ensure that you carry sufficient precision at each stage of the computation. This requires careful bookkeeping, but it isn't rocket science. You simply keep careful error bounds and propagate them through the algorithm.
If you want to correct rounding without explicitly checking a residual, you need to compute sqrt(f) to a precision of 2p + 2 bits (where p is precision of the source and destination type). However, you can also take the strategy of computing sqrt(f) to a little more than p bits, square that value, and adjust the trailing bit by one if necessary (which is often cheaper).
sqrt is nice in that it is a unary function, which makes exhaustive testing for single-precision feasible on commodity hardware.
You can find the OS X soft-float sqrtf function on opensource.apple.com, which uses the algorithm described above (I wrote it, as it happens). It is licensed under the APSL, which may or not be suitable for your needs.
Probably (still) the fastest implementation for finding the inverse square root and the 10 lines of code that I adore the most.
It's based on Newton Approximation, but with a few quirks. There's even a great story around this.
Easiest to implement (you can even implement this in a calculator):
def sqrt(x, TOL=0.000001):
y=1.0
while( abs(x/y -y) > TOL ):
y= (y+x/y)/2.0
return y
This is exactly equal to newton raphson:
y(new) = y - f(y)/f'(y)
f(y) = y^2-x and f'(y) = 2y
Substituting these values:
y(new) = y - (y^2-x)/2y = (y^2+x)/2y = (y+x/y)/2
If division is expensive you should consider: http://en.wikipedia.org/wiki/Shifting_nth-root_algorithm .
Shifting algorithms:
Let us assume you have two numbers a and b such that least significant digit (equal to 1) is larger than b and b has only one bit equal to (eg. a=1000 and b=10). Let s(b) = log_2(b) (which is just the location of bit valued 1 in b).
Assume we already know the value of a^2. Now (a+b)^2 = a^2 + 2ab + b^2. a^2 is already known, 2ab: shift a by s(b)+1, b^2: shift b by s(b).
Algorithm:
Initialize a such that a has only one bit equal to one and a^2<= n < (2*a)^2.
Let q=s(a).
b=a
sqra = a*a
For i = q-1 to -10 (or whatever significance you want):
b=b/2
sqrab = sqra + 2ab + b^2
if sqrab > n:
continue
sqra = sqrab
a=a+b
n=612
a=10000 (16)
sqra = 256
Iteration 1:
b=01000 (8)
sqrab = (a+b)^2 = 24^2 = 576
sqrab < n => a=a+b = 24
Iteration 2:
b = 4
sqrab = (a+b)^2 = 28^2 = 784
sqrab > n => a=a
Iteration 3:
b = 2
sqrab = (a+b)^2 = 26^2 = 676
sqrab > n => a=a
Iteration 4:
b = 1
sqrab = (a+b)^2 = 25^2 = 625
sqrab > n => a=a
Iteration 5:
b = 0.5
sqrab = (a+b)^2 = 24.5^2 = 600.25
sqrab < n => a=a+b = 24.5
Iteration 6:
b = 0.25
sqrab = (a+b)^2 = 24.75^2 = 612.5625
sqrab < n => a=a
Iteration 7:
b = 0.125
sqrab = (a+b)^2 = 24.625^2 = 606.390625
sqrab < n => a=a+b = 24.625
and so on.
A good approximation to square root on the range [1,4) is
def sqrt(x):
y = x*-0.000267
y = x*(0.004686+y)
y = x*(-0.034810+y)
y = x*(0.144780+y)
y = x*(-0.387893+y)
y = x*(0.958108+y)
return y+0.315413
Normalise your floating point number so the mantissa is in the range [1,4), use the above algorithm on it, and then divide the exponent by 2. No floating point divisions anywhere.
With the same CPU time budget you can probably do much better, but that seems like a good starting point.
I have the equation y = 3(x+1)^2 + 5(x+1)^4.
Using Horner's scheme I could evaluate this polynomial in this form, y = 8+x(26+x(33+x(20+5x))), thus requiring 8 arithmetic operations.
I could also evaluate it in this form, y = (x+1)^2 * ((5x+10)x+8), requiring 7 operations.
I've been told this can be done in 5 operations but Horner's algorithm is supposed to be most efficient and it can only do it in 7 operations. Am I missing something?
Let a = (x+1)^2, that's 2 ops. Then y=3a + 5a^2 = a(3+5a), 3 more ops for a total of 5.
3(x+1)^2 + 5(x+1)^4 = (x+1)^2[3 + 5(x+1)^2].
I can do that in 5 operations:
1) x+1
2) (x+1)^2
3) 5(x+1)^2
4) 5(x+1)^2 + 3
5) (x+1)^2[5(x+1)^2 + 3]