Why is squaring a number faster than multiplying two random numbers? - algorithm

Multiplying two binary numbers takes n^2 time, yet squaring a number can be done more efficiently somehow. (with n being the number of bits) How could that be?
Or is it not possible? This is insanity!

There exist algorithms more efficient than O(N^2) to multiply two numbers (see Karatsuba, Pollard, Schönhage–Strassen, etc.)
The two problems "multiply two arbitrary N-bit numbers" and "Square an arbitrary N-bit number" have the same complexity.
We have
4*x*y = (x+y)^2 - (x-y)^2
So if squaring N-bit integers takes O(f(N)) time, then the product of two arbitrary N-bit integers can be obtained in O(f(N)) too. (that is 2x N-bit sums, 2x N-bit squares, 1x 2N-bit sum, and 1x 2N-bit shift)
And obviously we have
x^2 = x * x
So if multiplying two N-bit integers takes O(f(N)), then squaring a N-bit integer can be done in O(f(N)).
Any algorithm computing the product (resp the square) provides an algorithm to compute the square (resp the product) with the same asymptotic cost.
As noted in other answers, the algorithms used for fast multiplication can be simplified in the case of squaring. The gain will be on the constant in front of the f(N), and not on f(N) itself.

Squaring an n digit number may be faster than multiplying two random n digit numbers. Googling I found this article. It is about arbitrary precision arithmetic but it may be relevant to what your asking. In it the authors say this:
In squaring a large integer, i.e. X^2
= (xn-1, xn-2, ... , x1, x0)^2 many cross-product terms of the form xi *
xj and xj * xi are equivalent. They
need to be computed only once and then
left shifted in order to be doubled.
An n-digit squaring operation is
performed using only (n^2 + n)/2
single-precision multiplications.

Like others have pointed out, squaring can only be about 1.5X or 2X faster than regular multiplication between arbitrary numbers. Where does the computational advantage come from? It's symmetry. Let's calculate the square of 1011 and try to spot a pattern that we can exploit. u0:u3 represent the bits in the number from the most significant to the least significant.
1011 // u3 * u0 : u3 * u1 : u3 * u2 : u3 * u3
1011 // u2 * u0 : u2 * u1 : u2 * u2 : u2 * u3
0000 // u1 * u0 : u1 * u1 : u1 * u2 : u1 * u3
1011 // u0 * u0 : u0 * u1 : u0 * u2 : u0 * u3
If you consider the elements ui * ui for i=0, 1, ..., 4 to form the diagonal and ignore them, you'll see that the elements ui * uj for i ≠ j are repeated twice.
Therefore, all you need to do is calculate the product sum for elements below the diagonal and double it, with a left shift. You'd finally add the diagonal elements. Now you can see where the 2X speed up comes from. In practice, the speed-up is about 1.5X because of the diagonal and extra operations.

I believe you may be referring to exponentiation by squaring . This technique isn't used for multiplying, but for raising to a power x^n, where n may be large. Rather than multiply x
times itself N times, one performs a series of squaring and adding operations which can be mapped to the binary representation of N. The number of multiplication operations (which are more expensive than additions for large numbers) is reduced from N to log(N) with respect to the naive exponentiation algorithm.

Do you mean multiplying a number by a power of 2? This is usually quicker than multiplying any two random numbers since the result can be calculated by simple bit shifting. However, bear in mind that modern microprocessors dedicate lots of brute force silicon to these types of calculations and most arithmetic is performed with blinding speed compared to older microprocessors

I have it!
2 * 2
is more expensive than
2 << 1
(The caveat being it only works for one case.)

Suppose you want to expand out the multiplication (a+b)×(c+d). It splits up into four individual multiplications: a×c + a×d + b×c + b×d.
But if you want to expand out (a+b)², then it only needs three multiplications (and a doubling): a² + 2ab + b².
(Note also that two of the multiplications are themselves squares.)
Hopefully this just begins to give an insight into some of the speedups that are possible when performing a square over a regular multiplication.

First of all great question! I wish there were more questions like this.
So it turns out that the method I came up with is O(n log n) for general multiplication in the arithmetic complexity only. You can represent any number X as
X = x_{n-1} 2^{n-1} + ... + x_1 2^1 + x_0 2^0
Y = y_{m-1} 2^{m-1} + ... + y_1 2^1 + y_0 2^0
where
x_i, y_i \in {0,1}
then
XY = sum _ {k=0} ^ m+n r_k 2^k
where
r_k = sum _ {i=0} ^ k x_i y_{k-i}
which is just a straight forward application of FFT to find the values of r_k for each k in (n +m) log( n + m) time.
Then for each r_k you must determine how big the overflow is and add it up accordingly. For squaring a number this means O(n log n) arithmetic operations.
You can add up the r_k values more efficiently using the Schönhage–Strassen algorithm to obtain a O(n log n log log n) bit operation bound.
The exact answer to your question is already posted by Eric Bainville.
However, you can get a much better bound than O(n^2) for squaring a number simply because there exist much better bounds for multiplying integers!

If you assume fixed length to the word size of the machine and that the number to be squared is in memory, a squaring operation requires only one load from memory, so could be faster.
For arbitrary length integers, multiplication is typically O(N²) but there are algorithms which reduce this for large integers.
If you assume the simple O(N²) approach to multiply a by b, then for each bit in a you have to shift b and add it to an accumulator if that bit is one. For each bit in a you need 3N shifts and additions.
Note that
( x - y )² = x² - 2 xy + y²
Hence
x² = ( x - y )² + 2 xy - y²
If each y is the largest power of two not greater than x, this gives a reduction to a lower square, two shifts and two additions. As N is reduced on each iteration, you may get an efficiency gain ( the symmetry means it visits each point in a triangle rather than a rectangle ), but it's still O(N²).
There may be another better symmetry to exploit.

a^2
(a+b)*(a+b)+b^2 eg. 66^2 = (66+6)(66-6)+6^2 = 72*60+36= 4356
for a^n just use the power rule
66^4 = 4356^2

I would want to solve the problem by N bit multiplication
for a number
A the bits be A(n-1)A(n-2)........A(1)A(0).
B the bits be B(n-1)B(n-2)........B(1)B(0).
for the square of number A the unique multiplication bits generated will be
for A(0)->A(0)....A(n-1)
A(1)->A(1)....A(n-1) and so on
so the total operations will be
OP = n + n-1 + n-2 ....... + 1
Therefore OP = n^2+n/2;
so the Asymptotic notation will be O(n^2)
and for multiplication of A and B n^2 unique multiplications will be generated
so the Asymptotic notation will be O(n^2)

The square root of 2n is 2n / 2 or 2n >> 1, so if your number is a power of two everything is totally simple once you know the power. To multiply is even simplier: 24 * 28 is 24+8. There's no sense in this statements you've done.

If you have a binary number A, it can (always, proof left to the eager reader) be expressed as (2^n + B), this can be squared as 2^2n + 2^(n+1)B + B^2. We can then repeat the expansion, until such a point that B equals zero. I haven't looked too hard at it, but intuitively, it feels as if you should be able to make a squaring function take fewer algorithmical steps than a general-purpose multiplication.

I think that you are completely wrong in your statements
Multiplying two binary numbers takes
n^2 time
Multiplying two 32bit numbers take exactly one clock cycle. On a 64 bit processor, I would assume that multiplying two 64 bit numbers take exactly 1 clock cycle. It wouldn't even surprise my that a 32bit processor can multiply two 64bit numbers in 1 clock cycle.
yet squaring a number can be done more efficiently somehow.
Squaring a number is just multiplying the number with itself, so that is just a simple multiplication. There is no "square" operation in the CPU.
Maybe you are confusing "squaring" with "multiplying by a power of 2". Multiplying by 2 can be implemeted by shifting all the bits one position to the "left". Multiplying by 4 is shifting all the bits two positions to the "left". By 8, 3 positions. But this trick only applies to a power of two.

Related

Big O of multiplying 2 complex numbers?

What is the time complexity for multiplying two complex numbers?
For example (35 + 12i) *(45 +23i)
The asymptotic complexity is the same as for multiplying the components.
(35 + 12i) * (45 + 23i) == 35*45 + 45*12i + 35*23i - 12*23
== (35*45 - 12*23) + (45*12 + 35*23)i
You just have 4 real multiplications and 2 real additions.
So, if real multiplication is O(1), so is complex multiplication.
If real multiplication is not constant (as is the case for arbitrary precision values), then neither is complex multiplication.
If you multiply two complex numbers (a + bi) and (c + di), the calculation works out to (ac - bd, adi + bci), which requires a total of four multiplications and two subtractions. Additions and subtractions take less time than multiplications, so the main cost is the four multiplications done here. Since four is a constant, this doesn't change the big-O runtime of doing the muliplications compared to the real number case.
Let's imagine you have two numbers n1 and n2, each of which is d digits long. If you use the grade-school method for multiplying these numbers together, you'd do the following:
for each digit d1 of n2, in reverse:
let carry = 0
for each digit d2 of n1, in reverse:
let product = d1 * d2 + carry
write down product mod 10
set carry = product / 10, rounding down
add up all d of the d-digit numbers you wrote in step 1
That first loop runs in time Θ(d2), since each digit in n2 is paired and multiplied with each digit of n1, doing O(1) work apiece. The result is d different d-digit numbers. Adding up those numbers will take time Θ(d2), since you have to scan each number of each digit exactly once. Overall, this takes time Θ(d2).
Notice that this runtime is a function of how many digits are in n1 and n2, rather than n1 and n2 themselves. The number of digits in a number n is Θ(log n), so this runtime is actually O((log max{n1, n2})2) if you're multiplying two numbers n1 and n2.
This is not the fastest way to do multiplications, though for a while there was a conjecture that it was. Karatsuba's algorithm runs in time O((log max{n1, n2})log3 4), where the exponent is around 1.7ish. There are more modern algorithms that run even faster of this, and it's an open problem whether it can be done in time O(log d) with no exponent!
Multiplying two complex numbers only requires three real multiplications.
Let p = a * c, q = b * d, and r = (a + b) * (c + d).
Then (a + bi) * (c + di) = (p - q) + i(r - p - q).
See also Complex numbers product using only three multiplications.

Find prime factors such that difference is smallest as possible

Suppose n, a, b are positive integers where n is not a prime number, such that n=ab with a≥b and (a−b) is small as possible. What would be the best algorithm to find the values of a and b if n is given?
I read a solution where they try to represent n as the difference between two squares via searching for a square S bigger than n such that S - n = (another square). Why would that be better than simply finding the prime factors of n and searching for the combination where a,b are factors of n and a - b is minimized?
Firstly....to answer why your approach
simply finding the prime factors of n and searching for the combination where a,b are factors of n and a - b is minimized
is not optimal:
Suppose your number is n = 2^7 * 3^4 * 5^2 * 7 * 11 * 13 (=259459200), well within range of int. From the combinatorics theory, this number has exactly (8 * 5 * 3 * 2 * 2 * 2 = 960) factors. So, firstly you find all of these 960 factors, then find all pairs (a,b) such that a * b = n, which in this case will be (6C1 + 9C2 + 11C3 + 13C4 + 14C5 + 15C6 + 16C7 + 16C8) ways. (if I'm not wrong, my combinatorics is a bit weak). This is of the order 1e5 if implemented optimally. Also, implementation of this approach is hard.
Now, why the difference of squares approach
represent S - n = Q, such that S and Q are perfect squares
is good:
This is because if you can represent S - n = Q, this implies, n = S - Q
=> n = s^2 - q^2
=> n = (s+q)(s-q)
=> Your reqd ans = 2 * q
Now, even if you iterate for all squares, you will either find your answer or terminate when difference of 2 consecutive squares is greater than n
But I don't think this will be doable for all n (eg. if n=6, there is no solution for (S,Q).)
Another approach:
Iterate from floor(sqrt(n)) to 1. The first number (say, x), such that x|n will be one of the numbers in the required pair (a,b). Other will be, obvs, y = x/n. So, your answer will be y - x.
This is O(sqrt(n)) time complex algorithm.
A general method could be this:
Find the prime factorization of your number: n = Π pi ai. Except for the worst cases where n is prime or semiprime, this will be substantially faster than O(n1/2) time of the iteration down from the square root, which won't divide the found factors out of the number.
Recall that the simplest, trial division, prime factorization is done by repeatedly trying to divide the number by increasing odd numbers (or by primes) below the number's square root, dividing out of the number each factor -- thus prime by construction -- as it is found (n := n/f).
Then, lazily enumerate the factors of n in order from its prime factorization. Stop after producing half of them. Having thus found n's (not necessarily prime) factor that is closest to its square root, find the second factor by simple division.
In case this must repeatedly run many times, it will greatly pay out to precalculate the needed primes below the n's square root, to use in the factorizations.

Algorithm on exponential with irrational base

I know there is an O(logn) algorithm on calculating a^n where a is an integer, and n is a huge integer (probably the result need to modular another prime MOD).
I wondering whether there is still an O(logn) algorithm to calculate
(a+sqrt(b))^n + (a-sqrt(b))^n (mod MOD)
The irrational part sqrt(b) looks not easy to handle in the exponential calculation. All I can do is to calculate a+sqrt(b) and a-sqrt(b) part separately and add them together then do the modular, but if n is huge, it is easy to overflow. Any ideas?
You can do that by computing (in ZM[x] / &langle;x²-b&rangle;)
(a+x)^n+(a-x)^n mod (M, x^2-b)
where again you can use modular halving-and-squaring for the powers, where the intermediate results now are linear polynomials (over modular integers). Actually, you will only need one of the powers, the result is twice the constant coefficient.
Alternatively, these power combinations are the solution of the linear recursion of order 2
u[n+2]-2*a*u[n+1]+(a^2-b)*u[n]
where
u[0]=2 and u[1]=2*a
so that you can use fast matrix exponentiation of the system matrix of this recursion, again obtaining an O(log(n)) algorithm (disregarding bitsize).
Example: As per the comment, take a=3, b=8, n=2 (and integers mod M=10^9+7, example is not large enough for that to matter)
In the first variant, compute u[n]=(a+x)^n mod (M, x^2-b), so
u[0]=1
u[1]=3+x
u[2]=(3+x)^2 mod (x^2-8)=9+6x+8=17+6x
and twice the constant term is 2*17=34
In the second variant, the recursion is (with 2*a=6, a^2-b=1)
u[n+2]-6*u[n+1]+u[n]=0
so that the first sequence elements are
u[0]=2
u[1]=6
u[2]=6*u[1]-u[0]=34
If you expand (a+sqrt(b))^n + (a-sqrt(b))^n you get
( a + nC1 a^(n-1) √b + nC2 a^(n-2) b + nC3 a^(n-3) √b b + ... )
+( a - nC1 a^(n-1) √b + nC2 a^(n-2) b - nC3 a^(n-3) √b b + ... )
= 2 a + 0 + 2 nC2 a^(n-2) b + 0 + ... + 2 nC4 a^(n-4) b^2 + ...
so the terms involving the possibly irrational parts cancel. (nC2 etc are binomial coefficients).
The RHS of the above could be calculated fairly efficiently using integer arithmetic as you can relate each term in the sequence to the previous one. However there are n/2 terms so the calculation is O(n).
As we know the result will be an integer we can try running through the Exponentiation by squaring algorithm keeping track of the integer a fractional components. Write a+sqrt(b) = x + y where x is an integer an y is the fractional part.
Finding the square of this we have x^2 + 2 x y + y^2. Even though we are only interested in the integer part we have some problems as there is an integer part of 2 x y+ y^2. This causes problems as to effectively calculate the integer part we are going to know a lot of digits of y. When we come to higher powers you need more an more digits of y to get the integer part.
I don't think normal floating point multiplication would be good enough to calculate the terms for very large n.

Exponentiation in a low complexity way

I want to compute q^k, s.t. q is n bits wide, in the limitations:
the final result will be n*k bits wide.
for every step of the calculation, the result of multiplying x,y s.t. x is |x| bits wide and y is |y| bits wide is |x|*|y| bits wide.
I tried to do that in pairs; start with the q^2's, then q^4's, etc.
The 1st step result takes 2n bits, the 2nd takes (2^2)n bits, etc. and the last step takes n*2^(logk) (=kn) bits.
We have log(k) steps, and a Careful calculation brings us to:
O(log(n)(log(k))^2).
I'd be happy to hear about a faster way of doing that (or a better analysis of this algorithm or a similar), in the above restrictions.
Thanks in advance.
Assuming that multiplications with n-bit outputs have cost n f(n) for some function f that is positive and nondecreasing, the final multiplication costs asymptotically no less than the rest of the work. Preparing the squares for q^k where q has n bits costs
2 n f(2 n) + 4 n f(4 n) + ... + 2^floor(lg(k)) f(2^floor(lg(k)) n)
<= (2 n + 4 n + ... + 2^floor(lg(k)) n) f(2^floor(lg(k)) n)
<= 2 (2^floor(lg(k)) n) f(2^floor(lg(k)) n)
<= 2 k n f(k n),
which is twice the cost of the final multiplication. The other multiplications can be analyzed similarly.
I think the best integer pow is O(2*Log2(K)) when going through bits of k.
k can be written in binary form k = { kn-1,...k3,k2,k1,k0 }
so equation looks like this:
q^k = q^( 1*k0 +2*k1 +4*k2 +8*k3 ... )
q^k = q^k0 * q^(2*k1) * q^(4*k2) ....
so you have only n = ceil(log2(k)) steps to compute
each step multiply q^i to result if ki!=0
q(i+1)=qi*qi
q0 = q
simplified working code (does not handle special cases 0^k,k^0,0^0,...):
DWORD pow(DWORD q,DWORD k)
{
DWORD s;
for (s=1;k;k>>=1,q*=q)
if (k&1) s*=q;
return s;
}
of course if you want to use big q,k than bigint arithmetics is needed instead
i do not think that pairing low bits multiplications will be of much help
unless used for really big numbers
because the pairing and merging different bit-count sub-results is usually slower

Multipling two polynomials - Not exactly FFT

I'm currently studying FFT and I have trouble solving the following question:
Write a "divide and conquer" algorithm which multiplies two polynomials (max N degree) in complexity:
Theta of n^log3 (base 2 log ofc)
The algorithm should divide the two given polynomials coefficients into two groups:
Group1) Coefficients with even indexes.
Group2) Coefficients with odd indexes.
ehm, I don't even know how to start thinking about the solution..
The guidelines seem to be similar to FFT algorithm but I still can't see the solution.
Would love to get some assistance! even just a way to think about it..
please note that no code should be supplied.. only explanations and maybe pseudo code about how to get it done.
thanks !
Here are a few hints, then the solution.
1 -
First of all, you should make sure that you can perform the multiplication in less than n^2 coefficient multiplications, on a simple example :
(aX + b)*(cX + d)
One of your multiplications should be (a+b)*(c+d)
2 -
Haven't found how to do it ?
Here are the operations for each power :
X^2 : ac
X : (a+b)*(c+d) - ac - bd
1 : bd
You just have to perform 3 multiplications instead of 4. Additions do not cost that much compared to multiplications.
3 -
You are asked to find a solution in Theta(n^lg(3)). Here is a quick reminder of the 'Théorème Général' :
Let T(n) the cost of your algorithme for the polynoms with degree n.
With a 'divide to conquer strategy' which leads to :
T(n) = aT(n/b)+f(n)
If f(n)~O(n^lg_b(a)) then T(n) = Theta(n^lg_b(a))
You are looking for T(n) = Theta(n^lg_2(3)). This could mean that :
T(n)=3.T(n/2) + epsilon
If you split your polynoms in even and odd polynoms, they have half of the initial coefficients amount : n/2.
The formula shows you that you will perform three multiplications between the odd and even polynoms...
4 -
Consider to represent your polynom P(x) with degree n this way :
P(X) = X.A(X) + B(X)
A(X) and B(X) contain n/2 coefficients.
5 - Solution
P(X) = X.A(X) + B(X)
P'(X) = X.A'(X) + B'(X)
The coefficients of P*P'(X) is the sum of the coefficients of :
X^2.A.A'
X.(A.B'+A'.B) = X.[(A+B)(A'+B') - A.A' - B.B']
B.B'
So you have to call your multiplication algorithm on :
A and A'
A+B and A'+B'
B and B'
Then you can recombine coefficients with shifts and additions.
Cheers

Resources