I'm trying to find a way to loop through an array of integers of size N and multiply each of those integers by 128^((N-1) - i), where N is the length of the array and i is the index of the integer, and then adding all those results together.
For example, an array input of [1, 2, 3, 4] would return 1 * (128^3) + 2 * (128^2) + 3 * (128^1) + 4 * (128^0).
My algorithm needs to run in O(N) time, but the exponent operation is expensive, as, for example, 2^3 takes three operations. So, I need to find a way to operate on each integer in the array in O(1) time, using only arithmetic operations (-, +, *, /, %). The most obvious (incorrect) way I could think of is simply multiplying each integer (N-i) times, but that does not take constant time. I was also thinking of using exponentiation by squaring, but this takes log_2(N-i) time for operating on each integer, which is not constant.
128 is 2^7, and multiplying a number by 128^k shifts its binary representation left by 7*k positions.
1 * (128^3) + 2 * (128^2) + 3 * (128^1) + 4 * (128^0)
= 1000000000000000000000 + 1000000000000000 + 110000000 + 100
To answer the title question: it's possible to prove that with a constant number of those operations, you can't make numbers big enough for sufficiently large exponents.
To answer the underlying question: you can use the polynomial evaluation method sometimes attributed to Horner: ((1 * 128 + 2) * 128 + 3) * 128 + 4. Note that unless you're modding by something, manipulating the bignums is still going to cost you Õ(n2) time.
If you are indeed working with bignums, there's a more complicated divide and conquer method that should be faster assuming that bignum multiplication runs faster than the school method. The idea is to split the input in half, evaluate the lower and upper halves separately using recursion, and then put them together. On your example, this looks like
(1 * 128 + 2) * 128^2 + (3 * 128 + 4),
where we compute the term 128^2 (i.e., 128^(n/2)) by repeated squaring. The operation count is still O(n) since we have the recurrence
T(n) = 2 T(n/2) + O(log n),
which falls into Case 1. In practice, the running time will be dominated by the large multiplications, with whatever asymptotic complexity the particular implementation has.
Related
I try to understand a formula when we should use quicksort. For instance, we have an array with N = 1_000_000 elements. If we will search only once, we should use a simple linear search, but if we'll do it 10 times we should use sort array O(n log n). How can I detect threshold when and for which size of input array should I use sorting and after that use binary search?
You want to solve inequality that rougly might be described as
t * n > C * n * log(n) + t * log(n)
where t is number of checks and C is some constant for sort implementation (should be determined experimentally). When you evaluate this constant, you can solve inequality numerically (with uncertainty, of course)
Like you already pointed out, it depends on the number of searches you want to do. A good threshold can come out of the following statement:
n*log[b](n) + x*log[2](n) <= x*n/2 x is the number of searches; n the input size; b the base of the logarithm for the sort, depending on the partitioning you use.
When this statement evaluates to true, you should switch methods from linear search to sort and search.
Generally speaking, a linear search through an unordered array will take n/2 steps on average, though this average will only play a big role once x approaches n. If you want to stick with big Omicron or big Theta notation then you can omit the /2 in the above.
Assuming n elements and m searches, with crude approximations
the cost of the sort will be C0.n.log n,
the cost of the m binary searches C1.m.log n,
the cost of the m linear searches C2.m.n,
with C2 ~ C1 < C0.
Now you compare
C0.n.log n + C1.m.log n vs. C2.m.n
or
C0.n.log n / (C2.n - C1.log n) vs. m
For reasonably large n, the breakeven point is about C0.log n / C2.
For instance, taking C0 / C2 = 5, n = 1000000 gives m = 100.
You should plot the complexities of both operations.
Linear search: O(n)
Sort and binary search: O(nlogn + logn)
In the plot, you will see for which values of n it makes sense to choose the one approach over the other.
This actually turned into an interesting question for me as I looked into the expected runtime of a quicksort-like algorithm when the expected split at each level is not 50/50.
the first question I wanted to answer was for random data, what is the average split at each level. It surely must be greater than 50% (for the larger subdivision). Well, given an array of size N of random values, the smallest value has a subdivision of (1, N-1), the second smallest value has a subdivision of (2, N-2) and etc. I put this in a quick script:
split = 0
for x in range(10000):
split += float(max(x, 10000 - x)) / 10000
split /= 10000
print split
And got exactly 0.75 as an answer. I'm sure I could show that this is always the exact answer, but I wanted to move on to the harder part.
Now, let's assume that even 25/75 split follows an nlogn progression for some unknown logarithm base. That means that num_comparisons(n) = n * log_b(n) and the question is to find b via statistical means (since I don't expect that model to be exact at every step). We can do this with a clever application of least-squares fitting after we use a logarithm identity to get:
C(n) = n * log(n) / log(b)
where now the logarithm can have any base, as long as log(n) and log(b) use the same base. This is a linear equation just waiting for some data! So I wrote another script to generate an array of xs and filled it with C(n) and ys and filled it with n*log(n) and used numpy to tell me the slope of that least squares fit, which I expect to equal 1 / log(b). I ran the script and got b inside of [2.16, 2.3] depending on how high I set n to (I varied n from 100 to 100'000'000). The fact that b seems to vary depending on n shows that my model isn't exact, but I think that's okay for this example.
To actually answer your question now, with these assumptions, we can solve for the cutoff point of when: N * n/2 = n*log_2.3(n) + N * log_2.3(n). I'm just assuming that the binary search will have the same logarithm base as the sorting method for a 25/75 split. Isolating N you get:
N = n*log_2.3(n) / (n/2 - log_2.3(n))
If your number of searches N exceeds the quantity on the RHS (where n is the size of the array in question) then it will be more efficient to sort once and use binary searches on that.
I'm currently going through "Cracking the coding interview" textbook and I'm reviewing Big-O and runtime. One of the examples were as such:
Print all positive integer solutions to the equation a^3 + b^3 = c^3 + d^3 where a, b, c, d are integers between 1 and 1000.
The psuedo code solution provided is:
n = 1000;
for c from 1 to n
for d from 1 to n
result = c^3 + d^3
append (c,d) to list at value map[result]
for each result, list in map
for each pair1 in list
for each pair2 in list
print pair1, pair2
The runtime is O(N^2)
I'm not sure how O(N^2) is obtained and after extensive googling and trying to figure out why, I still have no idea. My rational is as following:
Top half is O(N^2) because the outer loop goes to n and inner loop executes n times each.
The bottom half I'm not sure how to calculate, but I got O(size of map) * O(size of list) * O(size of list) = O(size of map) * O(size of list^2).
O(N^2) + O(size of map) * O(size of list^2)
The 2 for loops adding the pairs to the list of the map = O(N) * O(N) b/c it's 2 for loops running N times.
The outer for loop for iterating through the map = O(2N-1) = O(N) b/c the size of the map is 2N - 1 which is essentially N.
The 2 for loops for iterating through the pairs of each list = O(N) * O(N) b/c each list is <= N
Total runtime: O(N^2) + O(N) * O(N^2) = O(N^3)
Not sure what I'm missing here
Could someone help me figure out how O(N^2) is obtained or why my solution is incorrect. Sorry if my explanation is a bit confusing. Thanks
Based on the first part of the solution, sum(size of lists) == N. This means that the second part (nested loop) cannot be more complex then O(N^2). As you said, the complexity is O(size of map)*O(size of list^2), but it should rather be:
O(size of map)*(O(size of list1^2) + O(size of list2^2) + ... )
This means, that in the worst-case scenario we will get a map of size 1, and one list of size N, and the resulting complexity of O(1)*O((N-1)^2) <==> O(N^2)
In other scenarios the complexity will be lower. For instance if we have map of 2 elements, then we will get 2 lists with the total size of N. So the result will then be:
O(2)*( O(size of list1^2) + O(size of list2^2)), where (size of list1)+(size of list2) == N
and we know from basic maths that X^2 + Y^2 <= (X+Y)^2 for positive numbers.
The complexity of the second part is O(sum of (length of lists)^2 in map), since the length of the list varies depending on the we know that sum of length of lists in map is n^2 since we definitely added n^2 pairs in the first bit of the code. Since T(program) = O(n^2) + O(sum of length of lists in map) * O(sum of length of lists in map / size of map) = O(n^2) * O(sum of length of lists in map / size of map), it remains to show that sum of length of lists in map / size of map is O(1). Doing this requires quite a bit of number theory and unfortunately I can't help you there. But do check out these links for more info on how you would go about it: https://en.wikipedia.org/wiki/Taxicab_number
https://math.stackexchange.com/questions/1274816/numbers-that-can-be-expressed-as-the-sum-of-two-cubes-in-exactly-two-different-w
http://oeis.org/A001235
This is a very interesting question! cdo256 made some good points, I will try to explain a bit more and complete the picture.
It is more or less obvious that the key questions are - how many integers exist that can be expressed as a sum of two positive cubes in k different ways (where k >= 2), and what is the possible size of k ? This number determines the sizes of lists which are values of map, which determine the total complexity of the program. Our "search space" is from 2 to 2 * 10^9 because c and d both iterate from 1 to 1000, so the sum of their cubes is at most 2 * 10^9. If none of the numbers in the range [2, 2 * 10^9] could be expressed as a sum of two cubes in more than one way, than the complexity of our program would be O(n^2). Why? Well, first part is obviously O(n^2), and the second part depends on the size of lists which are values of map. But in this case all lists have size 1, and there are n^2 keys in map which gives O(n^2).
However, that is not the case, there is a famous example of "taxicub number" 1729, so let us return to our main question - the number of different ways to express an integer as a sum of two cubes of positive integers. This is an active field of research in number theory, and great summary is given in Joseph H. Silverman's article Taxicabs and Sums of Two Cubes. I recommend to read it thoroughly. Current records are given here. Some interesting facts:
smallest integer that can be expressed as a sum of two cubes of positive integers in three different ways is 87,539,319
smallest integer that can be expressed as a sum of two cubes of positive integers in four different ways is 6,963,472,309,248 (> 2*10^9)
smallest integer that can be expressed as a sum of two cubes of positive integers in six different ways is 24,153,319,581,254,312,065,344 (> 2*10^9)
As you can easily see e.g. here, there are only 2184 integers in range [2, 2 * 10^9] that are expressible as a sum of two positive cubes in two or three different ways, and for k = 4,5,.. these numbers are out of our range. Therefore, the number of keys in map is very close to n^2, and sizes of the value lists are at most 3, which implies that the complexity of the code
for each pair1 in list
for each pair2 in list
print pair1, pair2
is constant, so the total complexity is again O(n^2).
I know there is an O(logn) algorithm on calculating a^n where a is an integer, and n is a huge integer (probably the result need to modular another prime MOD).
I wondering whether there is still an O(logn) algorithm to calculate
(a+sqrt(b))^n + (a-sqrt(b))^n (mod MOD)
The irrational part sqrt(b) looks not easy to handle in the exponential calculation. All I can do is to calculate a+sqrt(b) and a-sqrt(b) part separately and add them together then do the modular, but if n is huge, it is easy to overflow. Any ideas?
You can do that by computing (in ZM[x] / ⟨x²-b⟩)
(a+x)^n+(a-x)^n mod (M, x^2-b)
where again you can use modular halving-and-squaring for the powers, where the intermediate results now are linear polynomials (over modular integers). Actually, you will only need one of the powers, the result is twice the constant coefficient.
Alternatively, these power combinations are the solution of the linear recursion of order 2
u[n+2]-2*a*u[n+1]+(a^2-b)*u[n]
where
u[0]=2 and u[1]=2*a
so that you can use fast matrix exponentiation of the system matrix of this recursion, again obtaining an O(log(n)) algorithm (disregarding bitsize).
Example: As per the comment, take a=3, b=8, n=2 (and integers mod M=10^9+7, example is not large enough for that to matter)
In the first variant, compute u[n]=(a+x)^n mod (M, x^2-b), so
u[0]=1
u[1]=3+x
u[2]=(3+x)^2 mod (x^2-8)=9+6x+8=17+6x
and twice the constant term is 2*17=34
In the second variant, the recursion is (with 2*a=6, a^2-b=1)
u[n+2]-6*u[n+1]+u[n]=0
so that the first sequence elements are
u[0]=2
u[1]=6
u[2]=6*u[1]-u[0]=34
If you expand (a+sqrt(b))^n + (a-sqrt(b))^n you get
( a + nC1 a^(n-1) √b + nC2 a^(n-2) b + nC3 a^(n-3) √b b + ... )
+( a - nC1 a^(n-1) √b + nC2 a^(n-2) b - nC3 a^(n-3) √b b + ... )
= 2 a + 0 + 2 nC2 a^(n-2) b + 0 + ... + 2 nC4 a^(n-4) b^2 + ...
so the terms involving the possibly irrational parts cancel. (nC2 etc are binomial coefficients).
The RHS of the above could be calculated fairly efficiently using integer arithmetic as you can relate each term in the sequence to the previous one. However there are n/2 terms so the calculation is O(n).
As we know the result will be an integer we can try running through the Exponentiation by squaring algorithm keeping track of the integer a fractional components. Write a+sqrt(b) = x + y where x is an integer an y is the fractional part.
Finding the square of this we have x^2 + 2 x y + y^2. Even though we are only interested in the integer part we have some problems as there is an integer part of 2 x y+ y^2. This causes problems as to effectively calculate the integer part we are going to know a lot of digits of y. When we come to higher powers you need more an more digits of y to get the integer part.
I don't think normal floating point multiplication would be good enough to calculate the terms for very large n.
I've read that operations such as addition/subtraction were linear time, and that "grade-school" long multiplication is n^2 time. Why is this true?
Isn't addition floor(log n) times, when n is the smaller operand? The same argument goes for subtraction, and for multiplication, if we make a program to do long multiplication instead of adding integers together, shouldn't the complexity be floor(log a) * floor(log b) where a and b are the operands?
The answer depends on what is "n." When they say that addition is O(n) and multiplication (with the naïve algorithm) is O(n^2), n is the length of the number, either in bits or some other unit. This definition is used because arbitrary precision arithmetic is implemented as operations on lists of "digits" (not necessarily base 10).
If n is the number being added or multiplied, the complexities would be log n and (log n)^2 for positive n, as long as the numbers are stored in log n space.
The naive approach to multiplication of (for example) 273 x 12 is expanded out (using the distributive rule) as (200 + 70 + 3) x (10 + 2) or:
200 x 10 + 200 x 2
+ 70 x 10 + 70 x 2
+ 3 x 10 + 3 x 2
The idea of this simplification is to reduce the multiplications to something that can be done easily. For your primary school math, that would be working with digits, assuming you know the times tables from zero to nine. For bignum libraries where each "digit" may be a value from 0 to 9999 (for ease of decimal printing), the same rules apply, being able to multiply numbers less than 10,000 relatively constantly).
Hence, if n is the number of digits, the complexity is indeed O(n2) since the number of "constant" operations tends to rise with the product of the "digit" counts.
This is true even if your definition of digit varies slightly (such as being a value from 0 to 9999 or even being one of the binary digits 0 or 1).
Multiplying two binary numbers takes n^2 time, yet squaring a number can be done more efficiently somehow. (with n being the number of bits) How could that be?
Or is it not possible? This is insanity!
There exist algorithms more efficient than O(N^2) to multiply two numbers (see Karatsuba, Pollard, Schönhage–Strassen, etc.)
The two problems "multiply two arbitrary N-bit numbers" and "Square an arbitrary N-bit number" have the same complexity.
We have
4*x*y = (x+y)^2 - (x-y)^2
So if squaring N-bit integers takes O(f(N)) time, then the product of two arbitrary N-bit integers can be obtained in O(f(N)) too. (that is 2x N-bit sums, 2x N-bit squares, 1x 2N-bit sum, and 1x 2N-bit shift)
And obviously we have
x^2 = x * x
So if multiplying two N-bit integers takes O(f(N)), then squaring a N-bit integer can be done in O(f(N)).
Any algorithm computing the product (resp the square) provides an algorithm to compute the square (resp the product) with the same asymptotic cost.
As noted in other answers, the algorithms used for fast multiplication can be simplified in the case of squaring. The gain will be on the constant in front of the f(N), and not on f(N) itself.
Squaring an n digit number may be faster than multiplying two random n digit numbers. Googling I found this article. It is about arbitrary precision arithmetic but it may be relevant to what your asking. In it the authors say this:
In squaring a large integer, i.e. X^2
= (xn-1, xn-2, ... , x1, x0)^2 many cross-product terms of the form xi *
xj and xj * xi are equivalent. They
need to be computed only once and then
left shifted in order to be doubled.
An n-digit squaring operation is
performed using only (n^2 + n)/2
single-precision multiplications.
Like others have pointed out, squaring can only be about 1.5X or 2X faster than regular multiplication between arbitrary numbers. Where does the computational advantage come from? It's symmetry. Let's calculate the square of 1011 and try to spot a pattern that we can exploit. u0:u3 represent the bits in the number from the most significant to the least significant.
1011 // u3 * u0 : u3 * u1 : u3 * u2 : u3 * u3
1011 // u2 * u0 : u2 * u1 : u2 * u2 : u2 * u3
0000 // u1 * u0 : u1 * u1 : u1 * u2 : u1 * u3
1011 // u0 * u0 : u0 * u1 : u0 * u2 : u0 * u3
If you consider the elements ui * ui for i=0, 1, ..., 4 to form the diagonal and ignore them, you'll see that the elements ui * uj for i ≠ j are repeated twice.
Therefore, all you need to do is calculate the product sum for elements below the diagonal and double it, with a left shift. You'd finally add the diagonal elements. Now you can see where the 2X speed up comes from. In practice, the speed-up is about 1.5X because of the diagonal and extra operations.
I believe you may be referring to exponentiation by squaring . This technique isn't used for multiplying, but for raising to a power x^n, where n may be large. Rather than multiply x
times itself N times, one performs a series of squaring and adding operations which can be mapped to the binary representation of N. The number of multiplication operations (which are more expensive than additions for large numbers) is reduced from N to log(N) with respect to the naive exponentiation algorithm.
Do you mean multiplying a number by a power of 2? This is usually quicker than multiplying any two random numbers since the result can be calculated by simple bit shifting. However, bear in mind that modern microprocessors dedicate lots of brute force silicon to these types of calculations and most arithmetic is performed with blinding speed compared to older microprocessors
I have it!
2 * 2
is more expensive than
2 << 1
(The caveat being it only works for one case.)
Suppose you want to expand out the multiplication (a+b)×(c+d). It splits up into four individual multiplications: a×c + a×d + b×c + b×d.
But if you want to expand out (a+b)², then it only needs three multiplications (and a doubling): a² + 2ab + b².
(Note also that two of the multiplications are themselves squares.)
Hopefully this just begins to give an insight into some of the speedups that are possible when performing a square over a regular multiplication.
First of all great question! I wish there were more questions like this.
So it turns out that the method I came up with is O(n log n) for general multiplication in the arithmetic complexity only. You can represent any number X as
X = x_{n-1} 2^{n-1} + ... + x_1 2^1 + x_0 2^0
Y = y_{m-1} 2^{m-1} + ... + y_1 2^1 + y_0 2^0
where
x_i, y_i \in {0,1}
then
XY = sum _ {k=0} ^ m+n r_k 2^k
where
r_k = sum _ {i=0} ^ k x_i y_{k-i}
which is just a straight forward application of FFT to find the values of r_k for each k in (n +m) log( n + m) time.
Then for each r_k you must determine how big the overflow is and add it up accordingly. For squaring a number this means O(n log n) arithmetic operations.
You can add up the r_k values more efficiently using the Schönhage–Strassen algorithm to obtain a O(n log n log log n) bit operation bound.
The exact answer to your question is already posted by Eric Bainville.
However, you can get a much better bound than O(n^2) for squaring a number simply because there exist much better bounds for multiplying integers!
If you assume fixed length to the word size of the machine and that the number to be squared is in memory, a squaring operation requires only one load from memory, so could be faster.
For arbitrary length integers, multiplication is typically O(N²) but there are algorithms which reduce this for large integers.
If you assume the simple O(N²) approach to multiply a by b, then for each bit in a you have to shift b and add it to an accumulator if that bit is one. For each bit in a you need 3N shifts and additions.
Note that
( x - y )² = x² - 2 xy + y²
Hence
x² = ( x - y )² + 2 xy - y²
If each y is the largest power of two not greater than x, this gives a reduction to a lower square, two shifts and two additions. As N is reduced on each iteration, you may get an efficiency gain ( the symmetry means it visits each point in a triangle rather than a rectangle ), but it's still O(N²).
There may be another better symmetry to exploit.
a^2
(a+b)*(a+b)+b^2 eg. 66^2 = (66+6)(66-6)+6^2 = 72*60+36= 4356
for a^n just use the power rule
66^4 = 4356^2
I would want to solve the problem by N bit multiplication
for a number
A the bits be A(n-1)A(n-2)........A(1)A(0).
B the bits be B(n-1)B(n-2)........B(1)B(0).
for the square of number A the unique multiplication bits generated will be
for A(0)->A(0)....A(n-1)
A(1)->A(1)....A(n-1) and so on
so the total operations will be
OP = n + n-1 + n-2 ....... + 1
Therefore OP = n^2+n/2;
so the Asymptotic notation will be O(n^2)
and for multiplication of A and B n^2 unique multiplications will be generated
so the Asymptotic notation will be O(n^2)
The square root of 2n is 2n / 2 or 2n >> 1, so if your number is a power of two everything is totally simple once you know the power. To multiply is even simplier: 24 * 28 is 24+8. There's no sense in this statements you've done.
If you have a binary number A, it can (always, proof left to the eager reader) be expressed as (2^n + B), this can be squared as 2^2n + 2^(n+1)B + B^2. We can then repeat the expansion, until such a point that B equals zero. I haven't looked too hard at it, but intuitively, it feels as if you should be able to make a squaring function take fewer algorithmical steps than a general-purpose multiplication.
I think that you are completely wrong in your statements
Multiplying two binary numbers takes
n^2 time
Multiplying two 32bit numbers take exactly one clock cycle. On a 64 bit processor, I would assume that multiplying two 64 bit numbers take exactly 1 clock cycle. It wouldn't even surprise my that a 32bit processor can multiply two 64bit numbers in 1 clock cycle.
yet squaring a number can be done more efficiently somehow.
Squaring a number is just multiplying the number with itself, so that is just a simple multiplication. There is no "square" operation in the CPU.
Maybe you are confusing "squaring" with "multiplying by a power of 2". Multiplying by 2 can be implemeted by shifting all the bits one position to the "left". Multiplying by 4 is shifting all the bits two positions to the "left". By 8, 3 positions. But this trick only applies to a power of two.