In calculating the efficiency of algorithms, I have read that the exponentiation operation is not considered to be an atomic operation (like multiplication).
Is it because exponentiation is the same as the multiplication operation repeated several times over?
In principle, you can pick any set of "core" operations on numbers that you consider to take a single time unit to evaluate. However, there are a couple of reasons, though, why we typically don't count exponentiation as one of them.
Perhaps the biggest has to do with how large of an output you produce. Suppose you have two numbers x and y that are each d digits long. Then their sum x + y has (at most) d + 1 digits - barely bigger than what we started with. Their product xy has at most 2d digits - larger than what we started with, but not by a huge amount. On the other hand, the value xy has roughly yd digits, which can be significantly bigger than what we started with. (A good example of this: think about computing 100100, which has about 200 digits!) This means that simply writing down the result of the exponentiation would require a decent amount of time to complete.
This isn't to say that you couldn't consider exponentiation to be a constant-time operation. Rather, I've just never seen it done.
(Fun fact: some theory papers don't consider multiplication to be a constant-time operation, since the complexity of a hardware circuit to multiply two b-bit numbers grows quadratically with the size of b. And some theory papers don't consider addition to be constant-time either, especially when working with variable-length numbers! It's all about context. If you're dealing with "smallish" numbers that fit into machine words, then we can easily count addition and multiplication as taking constant time. If you have huge numbers - say, large primes for RSA encryption - then the size of the numbers starts to impact the algorithm's runtime and implementation.)
This is a matter of definition. For example in hardware-design and biginteger-processing multiplication is not considered an atomic operation (see e.g. this analysis of the karatsuba-algorithm).
On the level that is relevant for general purpose software-design on the other hand, multiplication can be considered as a fairly fast operation on fixed-digit numbers implemented in hardware. Exponentiation on the other hand is rarely implemented in hardware and an upper bound for the complexity can only be given in terms of the exponent, rather than the number of digits.
Related
So there are some number theory applications where we need to do modulo with big numbers, and we can choose the modulus. There's two groups that can get huge optimizations - Fermat and Mersenne.
So let's call an N bit sequence a chunk. N is often not a multiple of the word size.
For Fermat, we have M=2^N+1, so 2^N=-1 mod M, so we take the chunks of the dividend and alternate adding and subtracting.
For Mersenne, we have M=2^N-1, so 2^N=1 mod M, so we sum the chunks of the dividend.
In either case, we will likely end up with a number that takes up 2 chunks. We can apply this algorithm again if needed and finally do a general modulo algorithm.
Fermat will make the result smaller on average due to the alternating addition and subtraction. A negative result isn't that computationally expensive, you just keep track of the sign and fix it in the final modulo step. But I'd think bignum subtraction is a little slower than bignum addition.
Mersenne sums all chunks, so the result is a little larger, but that can be fixed with a second iteration of the algorithm at next to no extra cost.
So in the end, which is faster?
Schönhage–Strassen uses Fermat. There might be some other factors other than performance that make Fermat better than Mersenne - or maybe it's just straight up faster.
If you need a prime modulus, you're going to make the decision based on the convenience of the size.
For example, 2^31-1 is often convenient on 64-bit architectures, since it fits pretty snugly into 32 bits and and the product of two of them fits into a 64-bit word, either signed or unsigned.
On 32-bit architectures, 2^16+1 has similar advantages. It doesn't quite fit unto 16 bits, of course, but if you treat 0s a special case, then it's still pretty easy to multiply them in a 32-bit word.
I have the following problem
Under what circumstances can multiplication be regarded as a unit time
operation?
But I thought multiplication is always considered to be taking unit time. Was I wrong?
It depends on what N is. If N is the number of bits in an arbitrarily large number, then as the number of bits increases, it takes longer to compute the product. However, in most programming languages and applications, the size of numbers are capped to some reasonable number of bits (usually 32 or 64). In hardware, these numbers are multiplied in one step that does not depend on the size of the number.
When the number of bits is a fixed number, like 32, then it doesn't make sense to talk about asymptotic complexity, and you can treat multiplication like an O(1) operation in terms of whatever algorithm you're looking at. When can become arbitrarily large, like with Java's BigInteger class, then multiplication depends on the size of those numbers, as does the memory required to store them.
Only in those cases where you're performing operations on two numbers,of numeric type (emphasis here ,not going into the binary detail), you simply need to assume that the operation being performed is of constant time only.
It's not defined as unit time,but,more strictly, a constant time interval which doesn't change even if we increase the size of number,but, in reality the calculation does utilise subtle more time to perform calculation on large numbers). These are generally considered trivial,unless the numbers being multiplied are too large,like BigIntegers in Java,etc.
But,as soon as we move towards performing multiplication of binary strings, our complexity increases and naive method yields a complexity of O(n^2).
So,to aimplify, we perform a divide and conquer-based multiplication, also known as Karatsuba's algorithm for multiplication, which has a complexity of O(n^1.59) which reduces the total number of multiplications and additions to some lesser number of multiplications and some of the additions.
I hope I haven't misjudged the question. If so,please alert me so that
I can remove this answer. If I understood the question properly, then the other answer posted here seems
incomplete.
The expression unit time is a little ambiguous (and AFAIK not much used).
True unit time is achieved when the multiply is performed in a single clock cycle. This rarely occurs on modern processors.
If the execution time of the multiply does not depend on the particular values of the operands, we can say that it is performed in constant time.
When the operand length is bounded, so that the time never exceeds a given duration, we also say that an operation is performed in constant time.
This constant duration can be used as the timing unit of the running time, so that you count in "multiplies" instead of seconds (ops, flops).
Lastly, you can evaluate the performance of an algorithm in terms of the number of multiplies it performs, independently of the time they take.
Assume that I have two numbers a and b (a>b), and if I divide a by b (i.e. calculate a/b). How much time I need to provide?
Well, People are commenting about the instruction set as well architecture. so here is the assumption.
Assume a and b are two integers each of them has n bits and we have standard x86_64 machine with standard instruction set.
A request was made to provide an answer rather than just a link, so I will have a go at this. As pointed out by phs above, there is a good link at https://en.wikipedia.org/wiki/Division_algorithm#Newton.E2.80.93Raphson_division.
Division is one of a number of operations which, as far as computational complexity theory is concerned, are no more expensive than multiplication. One of the reasons for this is that computational complexity theory only really cares about how the cost of an algorithm grows as the amount of data to it gets large, which in this case means multi-precision division. Another is that there is a faster algorithm for division than pen-and-paper long division - this algorithm is in fact good enough to influence the design of computer hardware - famous examples being the Cray-1 reciprocal iteration and the Pentium bug.
The fast way to do division is, instead of dividing a by by, multiply a by 1/b, reducing the problem to computing a reciprocal. To compute 1/b, you first of all scale the problem by powers of two to get b in the range [1, 2), and make a first guess of the answer, typically from a lookup table - the Pentium bug had errors in the lookup table. Now you have an answer with lots of error - you have 1/b + x, where x is the error, which is unknown to you, but small if your lookup table was of a decent size.
The theory of Newton-Raphson iteration for solving equations tells you that if c = 1/b + x is a guess for 1/b, then c(2-bc) is a better guess. If c = 1/b + x then some algebra will tell you that the better guess works out as 1/b -bx^2. You have squared the error x, and since x was small (say 0.1 to start off with) you have roughly doubled the number of bits correct.
You are doubling the number of bits you have correct every time you do this, so it doesn't take many iterations to get a (good enough) answer. Now (here comes the neat part) because you know each iteration is only an approximation anyway, you need only calculate it to the accuracy that you reckon the approximation will give, not the full accuracy of the answer you want. Most of the underlying work is the multiplication in c(2-b) and this grows faster than linear in the number of bits of accuracy you work to. When you sit down and work out the cost of all of this, you find that it grows rapidly enough with the number of digits that you get a sum that looks like 1 = 1/2 + 1/4 + 1/8 +... - lots of terms but converging to answer not too far off the very first one - and the cost of a multi-precision divide is not more than a constant factor more than the cost of a multi-precision multiply.
I am currently reading the Introduction to Algorithms book and I have a question in regard to analyzing an algorithm:
The computational cost for merge sort is c lg n according to the book and it says that
We restrict c to be a constant so that the word size does not grow arbirarily (If the word size could grow arbitrarily, we could store huge amounts of data in one word and operate on it all in constant time)
I do not understand the meaning of "constant" here. Could anyone explain clearly what this means?
Computational complexity in the study of algorithms deals with finding function(s) which provide upper and lower bounds for how much time (or space) the algorithm requires. Recall basic algebra in high school where you learned about the general point-slope formula for a line? That formula, y = mx + b, provided two parameters, m (slope), and b (y intercept), which described a line completely. Those constants (m,b) described where the line lay, and a larger slope meant that the line was steeper.
Algorithmic complexity is just a way to describe the upper (and possibly lower) bounds for how long an algorithm takes to run (and/or how much space is required). With big-O (and big-Theta) notation, you are finding a function which provides upper (and lower) bounds for the algorithm costs. The constants are just shifting the curve, not changing the shape of the curve.
We restrict c to be a constant so that the word size does not grow arbirarily (If the word size could grow arbitrarily, we could store huge amounts of data in one word and operate on it all in constant time)
On a physical computer, there is some maximum size to a machine word. On a 32-bit system, that would be 32 bits, and on a 64-bit system, it's probably 64 bits. Operations on machine words are (usually) assumed to take time O(1) even though they operate on lots of bits at the same time. For example, if you use a bitwise OR or bitwise AND on a machine word, you can think of it as performing 32 or 64 parallel OR or AND operations in a single unit of time.
When trying to build a theoretical model for a computing system, it's necessary to assume an upper bound on the maximum size of a machine word. If you don't do this, then you could claim that you could perform operations like "compute the OR of n values in time O(1)" or "add together two arbitrary-precision numbers in time O(1)," operations that you can't actually do on a real computer. Therefore, there's usually an assumption that the machine word has some maximum size so that if you do want to compute the OR of n values, you can still do so, but you can't do it instantaneously by packing all the values into one machine word and performing a single assembly instruction to get the result.
Hope this helps!
I am not really trying to optimize anything, but I remember hearing this from programmers all the time, that I took it as a truth. After all they are supposed to know this stuff.
But I wonder why is division actually slower than multiplication? Isn't division just a glorified subtraction, and multiplication is a glorified addition? So mathematically I don't see why going one way or the other has computationally very different costs.
Can anyone please clarify the reason/cause of this so I know, instead of what I heard from other programmer's that I asked before which is: "because".
CPU's ALU (Arithmetic-Logic Unit) executes algorithms, though they are implemented in hardware. Classic multiplications algorithms includes Wallace tree and Dadda tree. More information is available here. More sophisticated techniques are available in newer processors. Generally, processors strive to parallelize bit-pairs operations in order the minimize the clock cycles required. Multiplication algorithms can be parallelized quite effectively (though more transistors are required).
Division algorithms can't be parallelized as efficiently. The most efficient division algorithms are quite complex (The Pentium FDIV bug demonstrates the level of complexity). Generally, they requires more clock cycles per bit. If you're after more technical details, here is a nice explanation from Intel. Intel actually patented their division algorithm.
But I wonder why is division actually slower than multiplication? Isn't division just a glorified subtraction, and multiplication is a glorified addition?
The big difference is that in a long multiplication you just need to add up a bunch of numbers after shifting and masking. In a long division you have to test for overflow after each subtraction.
Lets consider a long multiplication of two n bit binary numbers.
shift (no time)
mask (constant time)
add (neively looks like time proportional to n²)
But if we look closer it turns out we can optimise the addition by using two tricks (there are further optimisations but these are the most important).
We can add the numbers in groups rather than sequentially.
Until the final step we can add three numbers to produce two rather than adding two to produce one. While adding two numbers to produce one takes time proportional to n, adding three numbers to produce two can be done in constant time because we can eliminate the carry chain.
So now our algorithm looks like
shift (no time)
mask (constant time)
add numbers in groups of three to produce two until there are only two left (time proportional to log(n))
perform the final addition (time proportional to n)
In other words we can build a multiplier for two n bit numbers in time roughly proportional to n (and space roughly proportional to n²). As long as the CPU designer is willing to dedicate the logic multiplication can be almost as fast as addition.
In long division we need to know whether each subtraction overflowed before we can decide what inputs to use for the next one. So we can't apply the same parallising tricks as we can with long multiplication.
There are methods of division that are faster than basic long division but still they are slower than multiplication.