I'm learning a course about big O notation on Coursera. I watched a video about the big O of a Fibonacci algorithm (non-recursion method), which is like this:
Operation Runtime
create an array F[0..n] O(n)
F[0] <-- 0 O(1)
F[1] <-- 1 O(1)
for i from 2 to n: Loop O(n) times
F[i] <-- F[i-1] + F[i-2] O(n) => I don't understand this line, isn't it O(1)?
return F[n] O(1)
Total: O(n)+O(1)+O(1)+O(n)*O(n)+O(1) = O(n^2)
I understand every part except F[i] <-- F[i-1] + F[i-2] O(n) => I don't understand this line, isn't it O(1) since it's just a simple addition? Is it the same with F[i] <-- 1+1?
The explanation they give me is:"But the addition is a bit worse. And normally additions are constant time. But these are large numbers. Remember, the nth Fibonacci number has about n over 5 digits to it, they're very big, and they often won't fit in the machine word."
"Now if you think about what happens if you add two very big numbers together, how long does that take? Well, you sort of add the tens digit and you carry, and you add the hundreds digit and you carry, and add the thousands digit, you carry and so on and so forth. And you sort of have to do work for each digits place.
And so the amount of work that you do should be proportional to the number of digits. And in this case, the number of digits is proportional to n, so this should take O(n) time to run that line of code".
I'm still a bit confusing. Does it mean a large number affects time complexity too? For example a = n+1 is O(1) while a = n^50+n^50 isn't O(1) anymore?
Video link for anyone who needed more information (4:56 to 6:26)
Big-O is just a notation for keeping track of orders of magnitude. But when we apply that in algorithms, we have to remember "orders of magnitude of WHAT"? In this case it is "time spent".
CPUs are set up to execute basic arithmetic on basic arithmetic types in constant time. For most purposes, we can assume we are dealing with those basic types.
However if n is a very large positive integer, we can't assume that. A very large integer will need O(log(n)) bits to represent. Which, whether we store it as bits, bytes, etc, will need an array of O(log(n)) things to store. (We would need fewer bytes than bits, but that is just a constant factor.) And when we do a calculation, we have to think about what we will actually do with that array.
Now suppose that we're trying to calculate n+m. We're going to need to generate a result of size O(log(n+m)), which must take at least that time to allocate. Luckily the grade school method of long addition where you add digits and keep track of carrying, can be adapted for big integer libraries and is O(log(n+m)) to track.
So when you're looking at addition, the log of the size of the answer is what matters. Since log(50^n) = n * log(50) that means that operations with 50^n are at least O(n). (Getting 50^n might take longer...) And it means that calculating n+1 takes time O(log(n)).
Now in the case of the Fibonacci sequence, F(n) is roughly φ^n where φ = (1 + sqrt(5))/2 so log(F(n)) = O(n).
Related
Let's say that the algorithm involves iterating through a string character by character.
If I know for sure that the length of the string is less than, say, 15 characters, will the time complexity be O(1) or will it remain as O(n)?
There are two aspects to this question - the core of the question is, can problem constraints change the asymptotic complexity of an algorithm? The answer to that is yes. But then you give an example of a constraint (strings limited to 15 characters) where the answer is: the question doesn't make sense. A lot of the other answers here are misleading because they address only the second aspect but try to reach a conclusion about the first one.
Formally, the asymptotic complexity of an algorithm is measured by considering a set of inputs where the input sizes (i.e. what we call n) are unbounded. The reason n must be unbounded is because the definition of asymptotic complexity is a statement like "there is some n0 such that for all n ≥ n0, ...", so if the set doesn't contain any inputs of size n ≥ n0 then this statement is vacuous.
Since algorithms can have different running times depending on which inputs of each size we consider, we often distinguish between "average", "worst case" and "best case" time complexity. Take for example insertion sort:
In the average case, insertion sort has to compare the current element with half of the elements in the sorted portion of the array, so the algorithm does about n2/4 comparisons.
In the worst case, when the array is in descending order, insertion sort has to compare the current element with every element in the sorted portion (because it's less than all of them), so the algorithm does about n2/2 comparisons.
In the best case, when the array is in ascending order, insertion sort only has to compare the current element with the largest element in the sorted portion, so the algorithm does about n comparisons.
However, now suppose we add the constraint that the input array is always in ascending order except for its smallest element:
Now the average case does about 3n/2 comparisons,
The worst case does about 2n comparisons,
And the best case does about n comparisons.
Note that it's the same algorithm, insertion sort, but because we're considering a different set of inputs where the algorithm has different performance characteristics, we end up with a different time complexity for the average case because we're taking an average over a different set, and similarly we get a different time complexity for the worst case because we're choosing the worst inputs from a different set. Hence, yes, adding a problem constraint can change the time complexity even if the algorithm itself is not changed.
However, now let's consider your example of an algorithm which iterates over each character in a string, with the added constraint that the string's length is at most 15 characters. Here, it does not make sense to talk about the asymptotic complexity, because the input sizes n in your set are not unbounded. This particular set of inputs is not valid for doing such an analysis with.
In the mathematical sense, yes. Big-O notation describes the behavior of an algorithm in the limit, and if you have a fixed upper bound on the input size, that implies it has a maximum constant complexity.
That said, context is important. All computers have a realistic limit to the amount of input they can accept (a technical upper bound). Just because nothing in the world can store a yottabyte of data doesn't mean saying every algorithm is O(1) is useful! It's about applying the mathematics in a way that makes sense for the situation.
Here are two contexts for your example, one where it makes sense to call it O(1), and one where it does not.
"I decided I won't put strings of length more than 15 into my program, therefore it is O(1)". This is not a super useful interpretation of the runtime. The actual time is still strongly tied to the size of the string; a string of size 1 will run much faster than one of size 15 even if there is technically a constant bound. In other words, within the constraints of your problem there is still a strong correlation to n.
"My algorithm will process a list of n strings, each with maximum size 15". Here we have a different story; the runtime is dominated by having to run through the list! There's a point where n is so large that the time to process a single string doesn't change the correlation. Now it makes sense to consider the time to process a single string O(1), and therefore the time to process the whole list O(n)
That said, Big-O notation doesn't have to only use one variable! There are problems where upper bounds are intrinsic to the algorithm, but you wouldn't put a bound on the input arbitrarily. Instead, you can describe each dimension of your input as a different variable:
n = list length
s = maximum string length
=> O(n*s)
It depends.
If your algorithm's requirements would grow if larger inputs were provided, then the algorithmic complexity can (and should) be evaluated independently of the inputs. So iterating over all the elements of a list, array, string, etc., is O(n) in relation to the length of the input.
If your algorithm is tied to the limited input size, then that fact becomes part of your algorithmic complexity. For example, maybe your algorithm only iterates over the first 15 characters of the input string, regardless of how long it is. Or maybe your business case simply indicates that a larger input would be an indication of a bug in the calling code, so you opt to immediately exit with an error whenever the input size is larger than a fixed number. In those cases, the algorithm will have constant requirements as the input length tends toward very large numbers.
From Wikipedia
Big O notation is a mathematical notation that describes the limiting behavior of a function when the argument tends towards a particular value or infinity.
...
In computer science, big O notation is used to classify algorithms according to how their run time or space requirements grow as the input size grows.
In practice, almost all inputs have limits: you cannot input a number larger than what's representable by the numeric type, or a string that's larger than the available memory space. So it would be silly to say that any limits change an algorithm's asymptotic complexity. You could, in theory, use 15 as your asymptote (or "particular value"), and therefore use Big-O notation to define how an algorithm grows as the input approaches that size. There are some algorithms with such terrible complexity (or some execution environments with limited-enough resources) that this would be meaningful.
But if your argument (string length) does not tend toward a large enough value for some aspect of your algorithm's complexity to define the growth of its resource requirements, it's arguably not appropriate to use asymptotic notation at all.
NO!
The time complexity of an algorithm is independent of program constraints. Here is (a simple) way of thinking about it:
Say your algorithm iterates over the string and appends all consonants to a list.
Now, for iteration time complexity is O(n). This means that the time taken will increase roughly in proportion to the increase in the length of the string. (Time itself though would vary depending on the time taken by the if statement and Branch Prediction)
The fact that you know that the string is between 1 and 15 characters long will not change how the program runs, it merely tells you what to expect.
For example, knowing that your values are going to be less than 65000 you could store them in a 16-bit integer and not worry about Integer overflow.
Do problem constraints change the time complexity of algorithms?
No.
If I know for sure that the length of the string is less than, say, 15 characters ..."
We already know the length of the string is less than SIZE_MAX. Knowing an upper fixed bound for string length does not make the the time complexity O(1).
Time complexity remains O(n).
Big-O measures the complexity of algorithms, not of code. It means Big-O does not know the physical limitations of computers. A Big-O measure today will be the same in 1 million years when computers, and programmers alike, have evolved beyond recognition.
So restrictions imposed by today's computers are irrelevant for Big-O. Even though any loop is finite in code, that need not be the case in algorithmic terms. The loop may be finite or infinite. It is up to the programmer/Big-O analyst to decide. Only s/he knows which algorithm the code intends to implement. If the number of loop iterations is finite, the loop has a Big-O complexity of O(1) because there is no asymptotic growth with N. If, on the other hand, the number of loop iterations is infinite, the Big-O complexity is O(N) because there is an asymptotic growth with N.
The above is straight from the definition of Big-O complexity. There are no ifs or buts. The way the OP describes the loop makes it O(1).
A fundamental requirement of big-O notation is that parameters do not have an upper limit. Suppose performing an operation on N elements takes a time precisely equal to 3E24*N*N*N / (1E24+N*N*N) microseconds. For small values of N, the execution time would be proportional to N^3, but as N gets larger the N^3 term in the denominator would start to play an increasing role in the computation.
If N is 1, the time would be 3 microseconds.
If N is 1E3, the time would be about 3E33/1E24, i.e. 3.0E9.
If N is 1E6, the time would be about 3E42/1E24, i.e. 3.0E18
If N is 1E7, the time would be 3E45/1.001E24, i.e. ~2.997E21
If N is 1E8, the time would be about 3E48/2E24, i.e. 1.5E24
If N is 1E9, the time would be 3E51/1.001E27, i.e. ~2.997E24
If N is 1E10, the time would be about 3E54/1.000001E30, i.e. 2.999997E24
As N gets bigger, the time would continue to grow, but no matter how big N gets the time would always be less than 3.000E24 seconds. Thus, the time required for this algorithm would be O(1) because one could specify a constant k such that the time necessary to perform the computation with size N would be less than k.
For any practical value of N, the time required would be proportional to N^3, but from an O(N) standpoint the worst-case time requirement is constant. The fact that the time changes rapidly in response to small values of N is irrelevant to the "big picture" behaviour, which is what big-O notation measures.
It will be O(1) i.e. constant.
This is because for calculating time complexity or worst-case time complexity (to be precise), we think of the input as a huge chunk of data and the length of this data is assumed to be n.
Let us say, we do some maximum work C on each part of this input data, which we will consider as a constant.
In order to get the worst-case time complexity, we need to loop through each part of the input data i.e. we need to loop n times.
So, the time complexity will be:
n x C.
Since you fixed n to be less than 15 characters, n can also be assumed as a constant number.
Hence in this case:
n = constant and,
(maximum constant work done) = C = constant
So time complexity is n x C = constant x constant = constant i.e. O(1)
Edit
The reason why I have said n = constant and C = constant for this case, is because the time difference for doing calculations for smaller n will become so insignificant (compared to n being a very large number) for modern computers that we can assume it to be constant.
Otherwise, every function ever build will take some time, and we can't say things like:
lookup time is constant for hashmaps
Why does the following code for each statement refer to big O constant (here I use 1 for the convention)?
I mean if the array size gets bigger the time complexity may get larger right? Also the number in total will get larger and larger, won't it affect the complexity?
Pseudocode:
def find_sum(given_array)
total = 0 # refers to O(1)
for each i in given array: #O(1)
total+=i #O(1)
return total #O(1)
TL;DR: Because the Big O notation is used to quantify an algorithm, with regards of how it behaves with an increment of its input.
I mean if the array size gets bigger the time complexity may get
larger right? Also the number in total will get larger and larger,
won't it affect the complexity?
You are mistaken the time taken by the algorithm with the time-complexity.
Let us start by clarifying what is Big O notation in the current context. From (source) one can read:
Big O notation is a mathematical notation that describes the limiting
behavior of a function when the argument tends towards a particular
value or infinity. (..) In computer science, big O notation is used to classify algorithms
according to how their run time or space requirements grow as the
input size grows.
Informally, in computer-science time-complexity and space-complexity theories, one can think of the Big O notation as a categorization of algorithms with a certain worst-case scenario concerning time and space, respectively. For instance, O(n):
An algorithm is said to take linear time/space, or O(n) time/space, if its time/space complexity is O(n). Informally, this means that the running time/space increases at most linearly with the size of the input (source).
So for this code:
def find_sum(given_array)
total = 0
for each i in given array:
total+=i
return total
the complexity is O(n) because with an increment of the input the complexity grows linear and not constant. More accurately Θ(n).
IMO it is not very accurate to find out the complexity like:
def find_sum(given_array)
total = 0 # refers to O(1)
for each i in given array: #O(1)
total+=i #O(1)
return total #O(1)
Since the Big O notation represents a set of functions with a certain asymptotic upper-bound; as one can read from source:
Big O notation characterizes functions according to their growth
rates: different functions with the same growth rate may be
represented using the same O notation.
More accurate would be :
def find_sum(given_array)
total = 0 # takes c1 time
for each i in given array:
total+=i # takes c2 time
return total # takes c3 time
So the time complexity would be c1 + n * c2 + c3, which can be simplified to n. And since both the lower and upper bounds of this function are the same we can use Θ(n) instead of O(n).
Why does the following code for each statement refer to big O constant (here I use 1 for the convention)?
Not sure, ask the person who wrote it. It seems clear the overall runtime is not O(1), so if that's the conclusion they arrive at, they are wrong. If they didn't mean to say that, what they wrote is either wrong or confusing.
I mean if the array size gets bigger the time complexity may get larger right?
Yes, it might. Indeed, here, it will, since you are at least iterating over the elements in the array. More elements in the array, more iterations of the loop. Straightforward.
Also the number in total will get larger and larger, won't it affect the complexity?
This is an interesting insight and the answer depends on how you conceive of numbers being represented. If you have fixed-length numeric representations (32-bit unsigned ints, double-precision floats, etc.) then addition is a constant-time operation. If you have variable-length representations (like a big integer library, or doing the algorithm by hand) then the complexity of adding would depend on the addition method used but would necessarily increase with number size (for regular add-with-carry, an upper logarithmic bound would be possible). Indeed, with variable-length representations, your complexity should at least include some parameter related to the size (perhaps max or average) of numbers in the array; otherwise, the runtime might be dominated by adding the numbers rather than looping (e.g., an array of two 1000^1000 bit integers would spend almost all time adding rather than looping).
No answer so far address the second question:
Also the number in total will get larger and larger, won't it affect the complexity?
which is very important and usually not accounted for.
The answer is, it depends on your computational model. If the underlying machine may add insanely arbitrarily large numbers in constant time, then no, it does not affect the time complexity.
A realistic machine, however, operates on values of fixed width. Modern computers happily add 64 bit quantities. Some may only add 16 bit-wide values at a time. A Turing machine - which is a base of the whole complexity theory - works with 1 bit at a time. In any case, once our numbers outgrow the register width, we must account for the fact that addition takes time proportional to the number of bits in the addends, which in this case is log(i) (or log(total), but since total grows as i*(i-1)/2, its bit width is approximately log(i*i) = 2 log(i)).
With this in mind, annotating
total+=i # O(log i)
is more prudent. Now the complexity of the loop
for each i in given array:
total+=i # O(log(i))
is sum[1..n] log(i) = log(n!) ~ n log(n). The last equality comes from the Stirling approximation of a factorial.
There is no way, that the loop:
for each i in given array:
total+=i
will run in O(1) time. Even if the size of input n is 1, asymptotic analysis will still indicate, that it runs in O(n), and not in O(1).
Asymptotic Analysis measures the time/space complexity in relation to the input size, and it does not necessarily show the exact number of operations performed.
Point, that O(1) is constant, does not mean that it's just one (1) operation, but rather it means, that this particular block (which takes O(1)) does not change when the input changes, and therefore, it has no correlation to the input, so it has a constant complexity.
O(n), on the other hand, indicates, that the complexity depends on n, and it changes depending on how the input n changes. O(n) is a linear relation, when input size and runtime have 1 to 1 correlation.
Correctly written comments would look like this:
def find_sum(given_array)
total = 0 #this is O(1), it doesn't depend on input
for each i in given array: #this is O(n), as loop will get longer as the input size gets longer
total+=i #again, O(1)
return total #again, O(1)
I am comparing two algorithms that determine whether a number is prime. I am looking at the upper bound for time complexity, but I can't understand the time complexity difference between the two, even though in practice one algorithm is faster than the other.
This pseudocode runs in exponential time, O(2^n):
Prime(n):
for i in range(2, n-1)
if n % i == 0
return False
return True
This pseudocode runs in half the time as the previous example, but I'm struggling to understand if the time complexity is still O(2^n) or not:
Prime(n):
for i in range(2, (n/2+1))
if n % i == 0
return False
return True
As a simple intuition of what big-O (big-O) and big-Θ (big-Theta) are about, they are about how changes the number of operations you need to do when you significantly increase the size of the problem (for example by a factor of 2).
The linear time complexity means that you increase the size by a factor of 2, the number of steps you need to perform also increases by about 2 times. This is what called Θ(n) and often interchangeably but not accurate O(n) (the difference between O and Θ is that O provides only an upper bound but Θ guarantees both upper and lower bounds).
The logarithmic time complexity (Θ(log(N))) means that when increase the size by a factor of 2, the number of steps you need to perform increases by some fixed amount of operations. For example, using binary search you can find given element in twice as long list using just one ore loop iterations.
Similarly the exponential time complexity (Θ(a^N) for some constant a > 1) means that if you increase that size of the problem just by 1, you need a times more operations. (Note that there is a subtle difference between Θ(2^N) and 2^Θ(N) and actually the second one is more generic, both lie inside the exponential time but neither of two covers it all, see wiki for some more details)
Note that those definition significantly depend on how you define "the size of the task"
As #DavidEisenstat correctly pointed out there are two possible context in which your algorithm can be seen:
Some fixed width numbers (for example 32-bit numbers). In such a context an obvious measure of the complexity of the prime-testing algorithm is the value being tested itself. In such case your algorithm is linear.
In practice there are many contexts where prime testing algorithm should work for really big numbers. For example many crypto-algorithms used today (such as Diffie–Hellman key exchange or RSA) rely on very big prime numbers like 512-bits, 1024-bits and so on. Also in those context the security is measured in the number of those bits rather than particular prime value. So in such contexts a natural way to measure the size of the task is the number of bits. And now the question arises: how many operations do we need to perform to check a value of known size in bits using your algorithm? Obviously if the value N has m bits it is about N ≈ 2^m. So your algorithm from linear Θ(N) converts into exponential 2^Θ(m). In other words to solve the problem for a value just 1 bit longer, you need to do about 2 times more work.
Exponential versus linear is a question of how the input is represented and the machine model. If the input is represented in unary (e.g., 7 is sent as 1111111) and the machine can do constant time division on numbers, then yes, the algorithm is linear time. A binary representation of n, however, uses about lg n bits, and the quantity n has an exponential relationship to lg n (n = 2^(lg n)).
Given that the number of loop iterations is within a constant factor for both solutions, they are in the same big O class, Theta(n). This is exponential if the input has lg n bits, and linear if it has n.
i hope this will explain you why they are in fact linear.
suppose you call function and see how many time they r executed
Prime(n): # 1 time
for i in range(2, n-1) #n-1-1 times
if n % i == 0 # 1 time
return False # 1 time
return True # 1 time
# overall -> n
Prime(n): # Time
for i in range(2, (n/2+1)) # n//(2+1) -1-1 time
if n % i == 0 # 1 time
return False # 1 time
return True # 1 time
# overall -> n/2 times -> n times
this show that prime is linear function
O(n^2) might be because of code block where this function is called.
So i came upon this question where:
we have to sort n numbers between 0 and n^3 and the answer of time complexity is O(n) and the author solved it this way:
first we convert the base of these numbers to n in O(n), therefore now we have numbers with maximum 3 digits ( because of n^3 )
now we use radix sort and therefore the time is O(n)
so i have three questions :
1. is this correct? and the best time possible?
2. how is it possible to convert the base of n numbers in O(n)? like O(1) for each number? because some previous topics in this website said its O(M(n) log(n))?!
3. and if this is true, then it means we can sort any n numbers from 0 to n^m in O(n) ?!
( I searched about converting the base of n numbers and some said its
O(logn) for each number and some said its O(n) for n numbers so I got confused about this too)
1) Yes, it's correct. It is the best complexity possible, because any sort would have to at least look at the numbers and that is O(n).
2) Yes, each number is converted to base-n in O(1). Simple ways to do this take O(m^2) in the number of digits, under the usual assumption that you can do arithmetic operations on numbers up to O(n) in O(1) time. m is constant so O(m^2) is O(1)... But really this step is just to say that the radix you use in the radix sort is in O(n). If you implemented this for real, you'd use the smallest power of 2 >= n so you wouldn't need these conversions.
3) Yes, if m is constant. The simplest way takes m passes in an LSB-first radix sort with a radix of around n. Each pass takes O(n) time, and the algorithm requires O(n) extra memory (measured in words that can hold n).
So the author is correct. In practice, though, this is usually approached from the other direction. If you're going to write a function that sorts machine integers, then at some large input size it's going to be faster if you switch to a radix sort. If W is the maximum integer size, then this tradeoff point will be when n >= 2^(W/m) for some constant m. This says the same thing as your constraint, but makes it clear that we're thinking about large-sized inputs only.
There is wrong assumption that radix sort is O(n), it is not.
As described on i.e. wiki:
if all n keys are distinct, then w has to be at least log n for a
random-access machine to be able to store them in memory, which gives
at best a time complexity O(n log n).
The answer is no, "author implementation" is (at best) n log n. Also converting these numbers can take probably more than O(n)
is this correct?
Yes it's correct. If n is used as the base, then it will take 3 radix sort passes, where 3 is a constant, and since time complexity ignores constant factors, it's O(n).
and the best time possible?
Not always. Depending on the maximum value of n, a larger base could be used so that the sort is done in 2 radix sort passes or 1 counting sort pass.
how is it possible to convert the base of n numbers in O(n)? like O(1) for each number?
O(1) just means a constant time complexity == fixed number of operations per number. It doesn't matter if the method chosen is not the fastest if only time complexity is being considered. For example, using a, b, c to represent most to least significant digits and x as the number, then using integer math: a = x/(n^2), b = (x-(a*n^2))/n, c = x%n (assumes x >= 0). (side note - if n is a constant, then an optimizing compiler may convert the divisions into a multiply and shift sequence).
and if this is true, then it means we can sort any n numbers from 0 to n^m in O(n) ?!
Only if m is considered a constant. Otherwise it's O(m n).
If I need to determine the algorithmic complexity of a process with a cost set by a given function, is it just a question of giving O(n^2 log n) - or whatever the big O happens to be?
Also, isn't big O just going to be the highest order of any term in the polynomial? If I'm asked to give a derivation I'm not sure what to provide because it seems a little trivial.
Last question, if I need to give the operation count for an algorithm and it's really straightforward - roughly like
array1, array2, array3 of size n
for i in n:
array2[i] = sqrt(array1[i])
array3[i] = array1[i]^2
For 'operation count' am I just counting up all my arithmetical operations and figuring out which ones (like sqrt) count as multiple operations, etc... Or can I just write that it's O(n)?
The algorithmic cost of a process is the costs of all the components of the process. For example, using your example, we can decompose the costs of everything
array1, array2, array3 of size n
This takes n time for each array, so a total of 3n time, which is in O(n).
for i in n:
This means that everything in the loop is multiplies by n.
array2[i] = sqrt(array1[i])
This takes O(1) time. Why? Accessing an array element is constant time. Taking the sqrt is constant time. And setting the value of an array element is constant time. So the whole operation is constant time.
array3[i] = array1[i] ^ 2
This takes O(1) time, for the same reasons as the previous operation.
So the whole running time is 3n + n * ( 1 + 1) (using rough math here, not exact), which is just in O(n) time. Does that help?
As for an actual derivation, there are specific techniques for this. Did you learn the precise mathematical definition of big-Oh notation?
This link describes the formal definition of big-Oh notation, and provides of an example of how to prove this stuff.