Check if array has duplicate, naive approach time analysis

Check if array has duplicate, naive approach time analysis - algorithm

From my textbook we have an algorithm that checks if there is a duplicate in an array, once the duplicate in the array is found, the method ends (this is the naive approach).
I am asked to find the Big Ω complexity in the worst case.
In this sort of example, the worst case would occur when the duplicates we are looking for are side by side in the end, or there is no duplicate in the array.
I am more confused of the run time on each line.
For example in the code i posted, the first for loop will run for (n-1) times were we have 9 inputs.
how many times, in n, would the second for loop run for in the worst case? It would check 36 times in this case.
I know that the worst case in Big O complexity would be (n^2).
I know Big O means that f(n) must at most reach (<=) g(n).
I know that Big Ω means that f(n) must at least reach (>=) g(n).
public class test2{
public static void main(String[] args){
int[] arrayint = new int[]{5,2,3,7,1,9,6,4,4};
for(int i = 0; i<arrayint.length-1; i++){
for(int j = i+1; j<arrayint.length;j++){
if(arrayint[i] == arrayint[j]){
System.out.println("duplicate found at " + i + " and " + j);
System.exit(0);
}
}
}
}
}

I know that the worst case in Big O complexity would be (n^2). I know
Big O means that f(n) must at most reach (<=) g(n). I know that Big Ω
means that f(n) must at least reach (>=) g(n).
Actually, big O means f(n) (for large enough inputs) must not reach c1 * g(n) - for some constant c1.
Similarly, for big Omega (Ω), f(n) must be higher than c2 * g(n) (again, for large enough inputs).
This means, it's fine to have the same g(n) for both big O and big Omega (Ω), because you can have:
c2 * g(n) <= f(n) <= c1 * g(n)
When this happens we say f(n) is in Ө(g(n)), which mean the asymptotic bound is tight. You can find more information in this thread about big Theta(Ө)
In your case, the worst case is when there are no duplicates in the array. In this case, both O(n^2) and Ω(n^2), which gives a tight bound of Ө(n^2).
As a side note, this problem is called the element distinctness problem, which is interesting because the problem itself (not the algorithm) has several asymptotic bounds, based on the computation model we use.

When i=0, then j will vary from 1 to 8. The inner loop executes 8 times.
When i=1, the inner loop executes 7 times (j = 2 to 8)
In general, when i=x, the inner loop executes n-i-1 times.
If you sum all the iterations, you get (n*(n-1))/2 iterations. So for 9 items, that's (9*(9-1))/2), or 36 times.

Related

What is the time complexity of Brute Force Algorithm of Kadane's Algorithm? [duplicate]

Here is a segment of an algorithm I came up with:
for (int i = 0; i < n - 1; i++)
for (int j = i; j < n; j++)
(...)
I am using this "double loop" to test all possible 2-element sums in a an array of size n.
Apparently (and I have to agree with it), this "double loop" is O(n²):
n + (n-1) + (n-2) + ... + 1 = sum from 1 to n = (n (n - 1))/2
Here is where I am confused:
for (int i = 0; i < n; i++)
for (int j = 0; j < n; j++)
(...)
This second "double loop" also has a complexity of O(n²), when it is clearly (at worst) much (?) better than the first.
What am I missing? Is the information accurate? Can someone explain this "phenomenon"?

(n (n - 1))/2 simplifies to n²/2 - n/2. If you use really large numbers for n, the growth rate of n/2 will be dwarfed in comparison to n², so for the sake of calculating Big-O complexity, you effectively ignore it. Likewise, the "constant" value of 1/2 doesn't grow at all as n increases, so you ignore that too. That just leaves you with n².
Just remember that complexity calculations are not the same as "speed". One algorithm can be five thousand times slower than another and still have a smaller Big-O complexity. But as you increase n to really large numbers, general patterns emerge that can typically be classified using simple formulae: 1, log n, n, n log n, n², etc.
It sometimes helps to create a graph and see what kind of line appears:
Even though the zoom factors of these two graphs are very different, you can see that the type of curve it produces is almost exactly the same.

Constant factors.
Big-O notation ignores constant factors, so even though the second loop is slower by a constant factor, they end up with the same time complexity.
Right there in the definition it tells you that you can pick any old constant factor:
... if and only if there is a positive constant M ...
This is because we want to analyse the growth rate of an algorithm - constant factors just complicates things and are often system-dependent (operations may vary in duration on different machines).
You could just count certain types of operations, but then the question becomes which operation to pick, and what if that operation isn't predominant in some algorithm. Then you'll need to relate operations to each other (in a system-independent way, which is probably impossible), or you could just assign the same weight to each, but that would be fairly inaccurate as some operations would take significantly longer than others.
And how useful would saying O(15n² + 568n + 8 log n + 23 sqrt(n) + 17) (for example) really be? As opposed to just O(n²).
(For the purpose of the below, assume n >= 2)
Note that we actually have asymptotically smaller (i.e. smaller as we approach infinity) terms here, but we can always simplify that to a matter of constant factors. (It's n(n+1)/2, not n(n-1)/2)
n(n+1)/2 = n²/2 + n/2
and
n²/2 <= n²/2 + n/2 <= n²
Given that we've just shown that n(n+1)/2 lies between C.n² and D.n², for two constants C and D, we've also just shown that it's O(n²).
Note - big-O notation is actually strictly an upper bound (so we only care that it's smaller than a function, not between two), but it's often used to mean Θ (big-Theta), which cares about both bounds.

From The Big O page on Wikipedia
In typical usage, the formal definition of O notation is not used
directly; rather, the O notation for a function f is derived by the
following simplification rules:
If f(x) is a sum of several terms, the
one with the largest growth rate is kept, and all others omitted
Big-O is used only to give the asymptotic behaviour - that one is a bit faster than the other doesn't come into it - they're both O(N^2)

You could also say that the first loop is O(n(n-1)/2). The fancy mathematical definition of big-O is something like:
function "f" is big-O of function "g" if there exists constants c, n such that f(x) < c*g(x) for some c and all x > n.
It's a fancy way of saying g is an upper bound past some point with some constant applied. It then follows that O(n(n-1)/2) = O((n^2-n)/2) is big-O of O(n^2), which is neater for quick analysis.

AFAIK, your second code snippet
for(int i = 0; i < n; i++) <-- this loop goes for n times
for(int j = 0; j < n; j++) <-- loop also goes for n times
(...)
So essentially, it's getting a O(n*n) = O(n^2) time complexity.
Per BIG-O theory, constant factor is neglected and only higher order is considered. that's to say, if complexity is O(n^2+k) then actual complexity will be O(n^2) constant k will be ignored.
(OR) if complexity is O(n^2+n) then actual complexity will be O(n^2) lower order n will be ignored.
So in your first case where complexity is O(n(n - 1)/2) will/can be simplified to
O(n^2/2 - n/2) = O(n^2/2) (Ignoring the lower order n/2)
= O(1/2 * n^2)
= O(n^2) (Ignoring the constant factor 1/2)

Precise Θ notation bound for the running time as a function

I'm studying for an exam, and i've come across the following question:
Provide a precise (Θ notation) bound for the running time as a
function of n for the following function
for i = 1 to n {
j = i
while j < n {
j = j + 4
}
}
I believe the answer would be O(n^2), although I'm certainly an amateur at the subject but m reasoning is the initial loop takes O(n) and the inner loop takes O(n/4) resulting in O(n^2/4). as O(n^2) is dominating it simplifies to O(n^2).
Any clarification would be appreciated.

If you proceed using Sigma notation, and obtain T(n) equals something, then you get Big Theta.
If T(n) is less or equal, then it's Big O.
If T(n) is greater or equal, then it's Big Omega.

theoretical analysis of comparisons

I'm first asked to develop a simple sorting algorithm that sorts an array of integers in ascending order and put it to code:
int i, j;
for ( i = 0; i < n - 1; i++)
{
if(A[i] > A[i+1])
swap(A, i+1, i);
for (j = n - 2; j >0 ; j--)
if(A[j] < A[j-1])
swap(A, j-1, j);
}
Now that I have the sort function, I'm asked to do a theoretical analysis for the running time of the algorithm. It says that the answer is O(n^2) but I'm not quite sure how to prove that complexity.
What I know so far is that the 1st loop runs from 0 to n-1, (so n-1 times), and the 2nd loop from n-2 to 0, (so n-2 times).
Doing the recurrence relation:
let C(n) = the number of comparisons
for C(2) = C(n-1) + C(n-2)
= C(1) + C(0)
C(2) = 0 comparisons?
C(n) in general would then be: C(n-1) + C(n-2) comparisons?
If anyone could guide my step by step, that would be greatly appreciated.

When doing a "real" big O - time complexity analysis, you select one operation that you count, obviously the one that dominates the running time. In your case you could either choose the comparison or the swap, since worst case there will be a lot of swaps right?
Then you calculate how many times this will be evoked, scaling to input. So in your case you are quite right with your analysis, you simply do this:
C = O((n - 1)(n - 2)) = O(n^2 -3n + 2) = O(n^2)
I come up with these numbers through reasoning about the flow of data in your code. You have one outer for-loop iterating right? Inside that for-loop you have another for-loop iterating. The first for-loop iterates n - 1 times, and the second one n - 2 times. Since they are nested, the actual number of iterations are actually the multiplication of these two, because for every iteration in the outer loop, the whole inner loop runs, doing n - 2 iterations.
As you might know you always remove all but the dominating term when doing time complexity analysis.
There is a lot to add about worst-case complexity and average case, lower bounds, but this will hopefully make you grasp how to reason about big O time complexity analysis.
I've seen a lot of different techniques for actually analyzing the expression, such as your recurrence relation. However I personally prefer to just reason about the code instead. There are few algorithms which have hard upper bounds to compute, lower bounds on the other hand are in general very hard to compute.

Your analysis is correct: the outer loop makes n-1 iterations. The inner loop makes n-2.
So, for each iteration of the outer loop, you have n-2 iterations on the internal loop. Thus, the total number of steps is (n-1)(n-2) = n^2-3n+2.
The dominating term (which is what matters in big-O analysis) is n^2, so you get O(n^2) runtime.
I personally wouldn't use the recurrence method in this case. Writing the recurrence equation is usually helpful in recursive functions, but in simpler algorithms like this, sometimes it's just easier to look at the code and do some simple math.

Do these two nested loops really have the same quadratic time complexity?

Here is a segment of an algorithm I came up with:
for (int i = 0; i < n - 1; i++)
for (int j = i; j < n; j++)
(...)
I am using this "double loop" to test all possible 2-element sums in a an array of size n.
Apparently (and I have to agree with it), this "double loop" is O(n²):
n + (n-1) + (n-2) + ... + 1 = sum from 1 to n = (n (n - 1))/2
Here is where I am confused:
for (int i = 0; i < n; i++)
for (int j = 0; j < n; j++)
(...)
This second "double loop" also has a complexity of O(n²), when it is clearly (at worst) much (?) better than the first.
What am I missing? Is the information accurate? Can someone explain this "phenomenon"?

(n (n - 1))/2 simplifies to n²/2 - n/2. If you use really large numbers for n, the growth rate of n/2 will be dwarfed in comparison to n², so for the sake of calculating Big-O complexity, you effectively ignore it. Likewise, the "constant" value of 1/2 doesn't grow at all as n increases, so you ignore that too. That just leaves you with n².
Just remember that complexity calculations are not the same as "speed". One algorithm can be five thousand times slower than another and still have a smaller Big-O complexity. But as you increase n to really large numbers, general patterns emerge that can typically be classified using simple formulae: 1, log n, n, n log n, n², etc.
It sometimes helps to create a graph and see what kind of line appears:
Even though the zoom factors of these two graphs are very different, you can see that the type of curve it produces is almost exactly the same.

Constant factors.
Big-O notation ignores constant factors, so even though the second loop is slower by a constant factor, they end up with the same time complexity.
Right there in the definition it tells you that you can pick any old constant factor:
... if and only if there is a positive constant M ...
This is because we want to analyse the growth rate of an algorithm - constant factors just complicates things and are often system-dependent (operations may vary in duration on different machines).
You could just count certain types of operations, but then the question becomes which operation to pick, and what if that operation isn't predominant in some algorithm. Then you'll need to relate operations to each other (in a system-independent way, which is probably impossible), or you could just assign the same weight to each, but that would be fairly inaccurate as some operations would take significantly longer than others.
And how useful would saying O(15n² + 568n + 8 log n + 23 sqrt(n) + 17) (for example) really be? As opposed to just O(n²).
(For the purpose of the below, assume n >= 2)
Note that we actually have asymptotically smaller (i.e. smaller as we approach infinity) terms here, but we can always simplify that to a matter of constant factors. (It's n(n+1)/2, not n(n-1)/2)
n(n+1)/2 = n²/2 + n/2
and
n²/2 <= n²/2 + n/2 <= n²
Given that we've just shown that n(n+1)/2 lies between C.n² and D.n², for two constants C and D, we've also just shown that it's O(n²).
Note - big-O notation is actually strictly an upper bound (so we only care that it's smaller than a function, not between two), but it's often used to mean Θ (big-Theta), which cares about both bounds.

From The Big O page on Wikipedia
In typical usage, the formal definition of O notation is not used
directly; rather, the O notation for a function f is derived by the
following simplification rules:
If f(x) is a sum of several terms, the
one with the largest growth rate is kept, and all others omitted
Big-O is used only to give the asymptotic behaviour - that one is a bit faster than the other doesn't come into it - they're both O(N^2)

You could also say that the first loop is O(n(n-1)/2). The fancy mathematical definition of big-O is something like:
function "f" is big-O of function "g" if there exists constants c, n such that f(x) < c*g(x) for some c and all x > n.
It's a fancy way of saying g is an upper bound past some point with some constant applied. It then follows that O(n(n-1)/2) = O((n^2-n)/2) is big-O of O(n^2), which is neater for quick analysis.

AFAIK, your second code snippet
for(int i = 0; i < n; i++) <-- this loop goes for n times
for(int j = 0; j < n; j++) <-- loop also goes for n times
(...)
So essentially, it's getting a O(n*n) = O(n^2) time complexity.
Per BIG-O theory, constant factor is neglected and only higher order is considered. that's to say, if complexity is O(n^2+k) then actual complexity will be O(n^2) constant k will be ignored.
(OR) if complexity is O(n^2+n) then actual complexity will be O(n^2) lower order n will be ignored.
So in your first case where complexity is O(n(n - 1)/2) will/can be simplified to
O(n^2/2 - n/2) = O(n^2/2) (Ignoring the lower order n/2)
= O(1/2 * n^2)
= O(n^2) (Ignoring the constant factor 1/2)

Big-O notation representation of a simple algorithm

How can I represent its complexity with Big-O notation? I am little bit confused since the second for loop changes according to the index of the outer loop. Is it still O(n^2)? or less complex? Thanks in advance
for (int k = 0; k<arr.length; k++){
for (m = k; m<arr.length; m++){
//do something
}
}

Your estimation comes from progression formula:
and, thus, is O(n^2). Why your case is progression? Because it's n + (n-1) + ... + 1 summation for your loops.

If you add all iterations of the second loop, you get 1+2+3+...+n, which is equal with n(n+1)/2 (n is the array length). That is n^2/2 + n/2. As you may already know, the relevant term in big-oh notation is the one qith the biggest power, and coeficients are not relevant. So, your complexity is still O(n^2).

well the runtime is cca half of the n^2 loop
but in big O notation it is still O(n^2)
because any constant time/cycle - operation is represented as O(1)
so O((n^2)/2) -> O((n^2)/c) -> O(n^2)
unofficially there are many people using O((n^2)/2) including me for own purposes (its more intuitive and comparable) ... closer to cycle/runtime
hope it helps

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio