Is this sub-quadratic time? - algorithm

After analyzing an algorithm i was working on this is the running-time
N Time
1000 0.019123315811157227
10000 0.11949563026428223
100000 1.4074015617370605
1000000 16.07071304321289
The algorithm simply returns common points in 2 2D arrays ( 'a' and 'b')
This is the code that has been used
def common_points(a,b):
start=time.time()
specialSuperSorting(a) #Insertion Sort - ~1/4 N^2
specialSuperSorting(b) #Insertion Sort - ~1/4 N^2
common=[]
for i in range(len(a)):
x=a[i]
#BinarySearch coordinates(x,y)-Returns True if found, false if not
y=specialBinarySearch(b,x)
if(y):
common.append(x)
end=time.time()
print(len(a),' ',end-start)
return common
I know I could have used a faster sorting algorithm ... but I just took the easier path to save my time as this was just an exercise
So is this sub-quadratic time ? and how can i decide based on the table of N against T(N) only... without having the algorithm itself

The runtime of an algorithm will be dominated by its slowest component. Insertion sort is already O(n^2) or quadratic, so your algorithm will be at least this slow. Assuming specialBinarySearch is O(log n), and you run it n times, this part of the algorithm will be O(n log n).
In summary, your algorithm runs in O(1/4 n^2 + 1/4 n^2 + n log n) = O(n^2). It's quadratic. The 1/4 doesn't change that. You can see this trend in your data, which tends upwards much faster than linear or n log n, if you were to plot it in a graph.

Related

Big O Notation With Separated Procedures

Today marks the first day I begin to study Algorithms and Algorithm Analysis. More specifically asymptotic analysis. But before I dive in I have one simple question that needs clarification that I cant seem to find anywhere else. Given the code snippet below what would the algorithm complexity be in Big O notation?
// Linear Computation: O(n)
....
//Merge Sort: O(n log(n))
....
//Merge Sort: O(n log(n))
....
//Nested for loops iterating n times each: O(n^2)
....
My assumption would be Big O: O(n) + (2 * O(n log (n) )) + O(n^2) but by definition of Big O do we simplify this further? Would we just call this program O(n^2) considering it is the worse of the three and can be upper bounded by a constant c * n^2?
When calculating time complexities, we compute it in terms of BIG-O. Now as our programs are huge it is not possible to compute the total of all complexities and on the other hand, there is no sense of doing it because in any expression consisting of big and small terms, if we change big terms then there will significant change in value but if we change small-term then there is no significant change. For eg take value 10000021 if we change leading one to 2, now our value will be 20000021 (huge change), now change ending 1 to 2, now our value will be 10000022 (little change). Similarly, when the program contains n^2 it is considered instead of o(n) or O(logn) . Change in n^2 is not considered. Therefore we consider n^2.
Order -- n! , 2^n , n^R ..... , n^3 , n^2 , nlogn , n , logn
Consider the maximum which is present in the program.
Usually, when calculating time complexities, large value of input is kept in mind. Say like n value is 1000000.
When you say,
Big O: O(n) + (2 * O(n log (n) )) + O(n^2),
for this large n, 'n' and (2 * O(n log (n) )) will not grow as much as O(n^2).
So the O(n^2) is the complexity decider for large n, and thatswhy overall complexity given is O(n^2)
For given C1, C2, C3, you can find C such that
C1 n + C2 n log(n) + C3 n² <= C n²
The n^2 term dominates, so the big-O is going to be that. The fact that there's two nlogn terms doesn't change anything, since there's a fixed number of them. And when n is big, n^2 is bigger than nlogn.

Running time complexity of bubble sort

I was looking at the bubble sort algorithim in wiki, it seems that the worst case is o(n2).
Let's take an array size of n.
int a = [1,2,3,4,5.....n]
For any n elements, the total number of comparisons, therefore, is (n - 1) + (n - 2)...(2) + (1) = n(n - 1)/2 or O(n2).
Can anyone explain me how is n(n-1)/2 equals o(n2). I am not able to understand on how they came to a conclusion that the worst case analsysis of this algorithim is o(n2)
They are looking at the case when N is getting closer to infinity. So n(n-1)/2 would be practically the same as n*n/2 or n^2 / 2.
And since they are only looking at how much does the time (which is required for it to run) increase as N increases that means that constants are irrelevant. In this case when N doubles the algorithm takes 4 times longer to execute. So we end up with n^2 or O(n^2).

Prove 3-Way Quicksort Big-O Bound

For 3-way Quicksort (dual-pivot quicksort), how would I go about finding the Big-O bound? Could anyone show me how to derive it?
There's a subtle difference between finding the complexity of an algorithm and proving it.
To find the complexity of this algorithm, you can do as amit said in the other answer: you know that in average, you split your problem of size n into three smaller problems of size n/3, so you will get, in è log_3(n)` steps in average, to problems of size 1. With experience, you will start getting the feeling of this approach and be able to deduce the complexity of algorithms just by thinking about them in terms of subproblems involved.
To prove that this algorithm runs in O(nlogn) in the average case, you use the Master Theorem. To use it, you have to write the recursion formula giving the time spent sorting your array. As we said, sorting an array of size n can be decomposed into sorting three arrays of sizes n/3 plus the time spent building them. This can be written as follows:
T(n) = 3T(n/3) + f(n)
Where T(n) is a function giving the resolution "time" for an input of size n (actually the number of elementary operations needed), and f(n) gives the "time" needed to split the problem into subproblems.
For 3-Way quicksort, f(n) = c*n because you go through the array, check where to place each item and eventually make a swap. This places us in Case 2 of the Master Theorem, which states that if f(n) = O(n^(log_b(a)) log^k(n)) for some k >= 0 (in our case k = 0) then
T(n) = O(n^(log_b(a)) log^(k+1)(n)))
As a = 3 and b = 3 (we get these from the recurrence relation, T(n) = aT(n/b)), this simplifies to
T(n) = O(n log n)
And that's a proof.
Well, the same prove actually holds.
Each iteration splits the array into 3 sublists, on average the size of these sublists is n/3 each.
Thus - number of iterations needed is log_3(n) because you need to find number of times you do (((n/3) /3) /3) ... until you get to one. This gives you the formula:
n/(3^i) = 1
Which is satisfied for i = log_3(n).
Each iteration is still going over all the input (but in a different sublist) - same as quicksort, which gives you O(n*log_3(n)).
Since log_3(n) = log(n)/log(3) = log(n) * CONSTANT, you get that the run time is O(nlogn) on average.
Note, even if you take a more pessimistic approach to calculate the size of the sublists, by taking minimum of uniform distribution - it will still get you first sublist of size 1/4, 2nd sublist of size 1/2, and last sublist of size 1/4 (minimum and maximum of uniform distribution), which will again decay to log_k(n) iterations (with a different k>2) - which will yield O(nlogn) overall - again.
Formally, the proof will be something like:
Each iteration takes at most c_1* n ops to run, for each n>N_1, for some constants c_1,N_1. (Definition of big O notation, and the claim that each iteration is O(n) excluding recursion. Convince yourself why this is true. Note that in here - "iteration" means all iterations done by the algorithm in a certain "level", and not in a single recursive invokation).
As seen above, you have log_3(n) = log(n)/log(3) iterations on average case (taking the optimistic version here, same principles for pessimistic can be used)
Now, we get that the running time T(n) of the algorithm is:
for each n > N_1:
T(n) <= c_1 * n * log(n)/log(3)
T(n) <= c_1 * nlogn
By definition of big O notation, it means T(n) is in O(nlogn) with M = c_1 and N = N_1.
QED

Sum of linear time algorithms is linear time?

Pardon me if the question is "silly". I am new to algorithmic time complexity.
I understand that if I have n numbers and I want to sum them, it takes "n steps", which means the algorithm is O(n) or linear time. i.e. Number of steps taken increases linearly with number of input, n.
If I write a new algorithm that does this summing 5 times, one after another, I understand that it is O(5n) = O(n) time, still linear (according to wikipedia).
Question
If I have say 10 different O(n) time algorithms (sum, linear time sort etc). And I run them one after another on the n inputs.
Does this mean that overall this runs in O(10n) = O(n), linear time?
Yep, O(kn) for any constant k, = O(n)
If you start growing your problem and decide that your 10 linear ops are actually k linear ops based on, say k being the length of a user input array, it would then be incorrect to drop that information from the big-oh
It's best to work it through from the definition of big-O, then learn the rule of thumb once you've "proved" it correct.
If you have 10 O(n) algorithms, that means that there are 10 constants C1 to C10, such that for each algorithm Ai, the time taken to execute it is less than Ci * n for sufficiently large n.
Hence[*] the time taken to run all 10 algorithms for sufficiently large n is less than:
C1 * n + C2 * n + ... + C10 * n
= (C1 + C2 + ... + C10) * n
So the total is also O(n), with constant C1 + ... + C10.
Rule of thumb learned: the sum of a constant number of O(f(n)) functions is O(f(n)).
[*] proof of this left to the reader. Hint: there are 10 different values of "sufficient" to consider.
Yes, O(10n) = O(n). Also, O(C*n) = O(n), where C is a constant. In this case C is 10. It might seem as O(n^2) if C is equal to n, but this is not true. As C is a constant, it does not
change with n.
Also, note that in summation of complexities, the highest order or the most complex one is considered the overall complexity. In this case it is O(n) + O(n) ... + O(n) ten times. Thus, it is O(n).

Complexity of recursive factorial program

What's the complexity of a recursive program to find factorial of a number n? My hunch is that it might be O(n).
If you take multiplication as O(1), then yes, O(N) is correct. However, note that multiplying two numbers of arbitrary length x is not O(1) on finite hardware -- as x tends to infinity, the time needed for multiplication grows (e.g. if you use Karatsuba multiplication, it's O(x ** 1.585)).
You can theoretically do better for sufficiently huge numbers with Schönhage-Strassen, but I confess I have no real world experience with that one. x, the "length" or "number of digits" (in whatever base, doesn't matter for big-O anyway of N, grows with O(log N), of course.
If you mean to limit your question to factorials of numbers short enough to be multiplied in O(1), then there's no way N can "tend to infinity" and therefore big-O notation is inappropriate.
Assuming you're talking about the most naive factorial algorithm ever:
factorial (n):
if (n = 0) then return 1
otherwise return n * factorial(n-1)
Yes, the algorithm is linear, running in O(n) time. This is the case because it executes once every time it decrements the value n, and it decrements the value n until it reaches 0, meaning the function is called recursively n times. This is assuming, of course, that both decrementation and multiplication are constant operations.
Of course, if you implement factorial some other way (for example, using addition recursively instead of multiplication), you can end up with a much more time-complex algorithm. I wouldn't advise using such an algorithm, though.
When you express the complexity of an algorithm, it is always as a function of the input size. It is only valid to assume that multiplication is an O(1) operation if the numbers that you are multiplying are of fixed size. For example, if you wanted to determine the complexity of an algorithm that computes matrix products, you might assume that the individual components of the matrices were of fixed size. Then it would be valid to assume that multiplication of two individual matrix components was O(1), and you would compute the complexity according to the number of entries in each matrix.
However, when you want to figure out the complexity of an algorithm to compute N! you have to assume that N can be arbitrarily large, so it is not valid to assume that multiplication is an O(1) operation.
If you want to multiply an n-bit number with an m-bit number the naive algorithm (the kind you do by hand) takes time O(mn), but there are faster algorithms.
If you want to analyze the complexity of the easy algorithm for computing N!
factorial(N)
f=1
for i = 2 to N
f=f*i
return f
then at the k-th step in the for loop, you are multiplying (k-1)! by k. The number of bits used to represent (k-1)! is O(k log k) and the number of bits used to represent k is O(log k). So the time required to multiply (k-1)! and k is O(k (log k)^2) (assuming you use the naive multiplication algorithm). Then the total amount of time taken by the algorithm is the sum of the time taken at each step:
sum k = 1 to N [k (log k)^2] <= (log N)^2 * (sum k = 1 to N [k]) =
O(N^2 (log N)^2)
You could improve this performance by using a faster multiplication algorithm, like Schönhage-Strassen which takes time O(n*log(n)*log(log(n))) for 2 n-bit numbers.
The other way to improve performance is to use a better algorithm to compute N!. The fastest one that I know of first computes the prime factorization of N! and then multiplies all the prime factors.
The time-complexity of recursive factorial would be:
factorial (n) {
if (n = 0)
return 1
else
return n * factorial(n-1)
}
So,
The time complexity for one recursive call would be:
T(n) = T(n-1) + 3 (3 is for As we have to do three constant operations like
multiplication,subtraction and checking the value of n in each recursive
call)
= T(n-2) + 6 (Second recursive call)
= T(n-3) + 9 (Third recursive call)
.
.
.
.
= T(n-k) + 3k
till, k = n
Then,
= T(n-n) + 3n
= T(0) + 3n
= 1 + 3n
To represent in Big-Oh notation,
T(N) is directly proportional to n,
Therefore,
The time complexity of recursive factorial is O(n).
As there is no extra space taken during the recursive calls,the space complexity is O(N).

Resources