Runtime of bubble/simple sort - sorting

In class, simple sort is used as like one of our first definitions of O(N) runtimes...
But since it goes through one less iteration of the array every time it runs, wouldn't it be something more along the lines of...
Runtime bubble= sum(i = 0, n, (n-i)) ?
And aren't only the biggest processes when run one after another counted in asymptotic analysis which would be the N iteration, why is by definition this sort not O(N)?

The sum of 1 + 2 + ... + N is N*(N+1)/2 ... (high school maths) ... and that approaches (N^2)/2 as N goes to infinity. Classic O(N^2).

I'm not sure where you (or your professor) got the notion that bubble sort is O(n). If your professor had a guaranteed O(n) sort algorithm, they'd be wise to try and patent it :-)
A bubble sort is, by it's very nature, O(n2).
That's because it has to make a full pass of the entire data set, to correctly place the first element.
Then a second pass of N - 1 elements to correctly place the second. And a third pass of N - 2 elements to correctly place the third.
And so on, effectively ending up with close to N * N / 2 operations which, removing the superfluous 0.5 constant, is O(n2).

The time complexity of bubble sort is O(n^2).
When considering the complexity, only the largest expression is considered (but not the factor)

Related

Will it work to sort an array in O(n) when an element occurs more than n / 2 times?

Assuming the comparison of elements in array takes O(n), would it be possible to sort the array in O(n) if an element occurs more than n / 2 times?
Actually it should be possible because there is this median algorithm which will find the middle value of an array, or am I wrong?
No.
Assume the smallest element in the array occurs more than n / 2 times, let's say ceil(n / 2) + 1. Then there are still n - (ceil(n / 2) + 1) ~ n / 2 = O(n) elements to be sorted, which still takes O(n log n) time.
No.
Even if an element occurs more than n/999 times, the complexity is not O(n) because sorting the remaining elements is still O(n log n).
The complexity does not tell you how much time you algorithm will take to complete, it is an indication to how much this time will change when you change n, even if the factor of this change is small.
However, some distributions can influence complexity, for example if you know that only m values are allowed (then aO(n) algorithm consists in using registers to count the occurrences of each allowed value).

Time complexity of an algorithm - n or n*n?

I'm trying to find out which is the Theta complexity of this algorithm.
(a is a list of integers)
def sttr(a):
for i in xrange(0,len(a)):
while s!=[] and a[i]>=a[s[-1]]:
s.pop()
s.append(i)
return s
On the one hand, I can say that append is being executed n (length of a array) times, so pop too and the last thing I should consider is the while condition which could be executed probably 2n times at most.
From this I can say that this algorithm is at most 4*n so it is THETA(n).
But isn't it amortised analysis?
On the other hand I can say this:
There are 2 nested cycles. The for cycle is being executed exactly n times. The while cycle could be executed at most n times since I have to remove item in each iteration. So the complexity is THETA(n*n).
I want to compute THETA but don't know which of these two options is correct. Could you give me advice?
The answer is THETA(n) and your arguments are correct.
This is not amortized analysis.
To get to amortized analysis you have to look at the inner loop. You can't easily say how fast the while will execute if you ignore the rest of the algorithm. Naive approach would be O(N) and that's correct since that's the maximum number of iterations. However, since we know that the total number of executions is O(N) (your argument) and that this will be executed N time we can say that the complexity of the inner loop is O(1) amortized.

Worst case runtime for "Find the element repeated more than n/2 times" using random

There is a problem "Find the element repeated more than n/2 times"
Can you please help to estimate time complexity for solution that uses random:
Pick random element in array
Iterate through array and count number of occurrences of selected element
If count is > N/2 - return that element.
Repeat from step 1.
What would be the worst case for this method, if I use perfect random generator that gives random uniform numbers? O(N²)?
My intuition says that on average it should give answer in two tries, but it is only average case. How to prove it? I'm not quite sure how to estimate running time for random algorithms.
Assuming that there is actually an element that appears more than n / 2 times, the expected running time is O(n). You can think about it this way - each time you choose an element, you need to do O(n) work to check whether it's the majority element. The question then is how many elements, on expectation, you're going to have to pick. Each time you choose an element randomly, you have at least 1/2 probability that you do pick something that's a majority element. On expectation, that means that you'll need to pick two elements before you find a majority element, so the runtime will be O(n). If you're curious why, notice that the probability that you find what you're looking for after exactly k probes (k > 0) is at most 2-k, since you need to have the first k - 1 probes not succeed and then have the kth probe succeed. You can then compute the expected value as
0 * 2-0 + 1 * 2-1 + 2 * 2-2 + ...
= 2
(This summation is known to work out to exactly two, though proving it is a bit messy.)
In the worst-case, though, every time you pick an element, you'll choose something that isn't a majority element. This is extraordinarily unlikely, though - the probability that you don't find a majority element after k rounds is at most 2-k. For k = 300, this number is less than one over the number of atoms in the universe. Therefore, even though you can't bound the worst-case runtime, it's so astronomically unlikely that it's something you can safely ignore.
Hope this helps!
There is no bound for the worst case running time of this algorithm.
The "perfect" random number generator's output cannot be contingent on the previous history; otherwise it would be imperfect (real-world pseudo-rngs are imperfect in this way, so you might be able to construct a real bound for a specific PRNG).
Consequently, it could take an arbitrary number of guesses before the RNG guesses one of the majority element's positions.
If you're allowed to rearrange the array, you could swap the wrong guess to the beginning of the (still unknown) part of the array, and then restrict the guesses to as-yet-unguessed positions. That would limit the number of iterations to n/2-1, so the worst-case running time for the algorithm would be O(n2).
Although it has no impact on the big-O runtime, you can almost always stop the count scan early, either because you've already found n/2+1 elements or because there are not enough unscanned elements left to bring the count up to that threshold. Even with that optimization, the worst-case (alternating elements) time for a single scan is still n and the expected scan is still O(n).
For randomized algorithms, the expected run time better characterizes their run time. For the algorithm you described the expected run time is at most
S = n * 1/2 + 2n * 1/2^2 + 3n * 1/2^3 + ... up to infinity
We can solve this as follows:
S = n/2 + 2n/2^2 + 3n/2^3 + ... up to infinity
2S = n + 2n/2 + 3n/2^2 + 4n/2^3 + ... up to infinity
(subtracting the top from bottom)
S = n + n/2 + n/4 + n/8 + ... up to infinity
= 2n
So the expected runtime is O(n).
If we talk about a worst-case complexity, we mean an worst case for the input, i.e. an input that forces the algorithm into it's worst possible running time.
This is the same for randomized algorithms. We calculate the expected complexity for an worst case input.
In your example an worst input would be an array of length n, that only contains a number a ⌊n/2⌋+1 times.
The complexity is now O(n)⋅E[X], where X is the number of tries you have to randomly pick a number from the array until you pick a.
If a is m times in the array E[X] = n/m holds. So for our worst case input we get E[X] = n/(⌊n/2⌋+1) < n/(n/2) = 2.
So this randomized algorithm has a worst case complexity of O(n).

Determining time complexity of an algorithm

Below is some pseudocode I wrote that, given an array A and an integer value k, returns true if there are two different integers in A that sum to k, and returns false otherwise. I am trying to determine the time complexity of this algorithm.
I'm guessing that the complexity of this algorithm in the worst case is O(n^2). This is because the first for loop runs n times, and the for loop within this loop also runs n times. The if statement makes one comparison and returns a value if true, which are both constant time operations. The final return statement is also a constant time operation.
Am I correct in my guess? I'm new to algorithms and complexity, so please correct me if I went wrong anywhere!
Algorithm ArraySum(A, n, k)
for (i=0, i<n, i++)
for (j=i+1, j<n, j++)
if (A[i]+A[j]=k)
return true
return false
Azodious's reasoning is incorrect. The inner loop does not simply run n-1 times. Thus, you should not use (outer iterations)*(inner iterations) to compute the complexity.
The important thing to observe is, that the inner loop's runtime changes with each iteration of the outer loop.
It is correct, that the first time the loop runs, it will do n-1 iterations. But after that, the amount of iterations always decreases by one:
n - 1
n - 2
n - 3
…
2
1
We can use Gauss' trick (second formula) to sum this series to get n(n-1)/2 = (n² - n)/2. This is how many times the comparison runs in total in the worst case.
From this, we can see that the bound can not get any tighter than O(n²). As you can see, there is no need for guessing.
Note that you cannot provide a meaningful lower bound, because the algorithm may complete after any step. This implies the algorithm's best case is O(1).
Yes. In the worst case, your algorithm is O(n2).
Your algorithm is O(n2) because every instance of inputs needs time complexity O(n2).
Your algorithm is Ω(1) because there exist one instance of inputs only needs time complexity Ω(1).
Following appears in chapter 3, Growth of Function, of Introduction to Algorithms co-authored by Cormen, Leiserson, Rivest, and Stein.
When we say that the running time (no modifier) of an algorithm is Ω(g(n)), we mean that no mater what particular input of size n is chosen for each value of n, the running time on that input is at least a constant time g(n), for sufficiently large n.
Given an input in which the summation of first two elements is equal to k, this algorithm would take only one addition and one comparison before returning true.
Therefore, this input costs constant time complexity and make the running time of this algorithm Ω(1).
No matter what the input is, this algorithm would take at most n(n-1)/2 additions and n(n-1)/2 comparisons before returning value.
Therefore, the running time of this algorithm is O(n2)
In conclusion, we can say that the running time of this algorithm falls between Ω(1) and O(n2).
We could also say that worst-case running of this algorithm is Θ(n2).
You are right but let me explain a bit:
This is because the first for loop runs n times, and the for loop within this loop also runs n times.
Actually, the second loop will run for (n-i-1) times, but in terms of complexity it'll be taken as n only. (updated based on phant0m's comment)
So, in worst case scenerio, it'll run for n * (n-i-1) * 1 * 1 times. which is O(n^2).
in best case scenerio, it's run for 1 * 1 * 1 * 1 times, which is O(1) i.e. constant.

Shell sort running time on pre-sorted list (best case)

I'm confused on the running time of shell sort if the list is pre-sorted (best case). Is it O(n) or O(n log n)?
for(k=n/2; k>0; k/=2)
for(i=k; i<n; i++)
for(j=i;j>k; j-=k)
if(a[j-k]>a[j]) swap
else break;
Shell sort is based on insertion sort, and insertion sort has O(n) running time for pre-sorted list, however, by introducing gaps (outermost loop), I don't know if it makes the running time of shell sort O(n log n) for pre-sorted list.
Thank's for the help
In the best case when the data is already ordered, the innermost loop will never swap. It will always immediately break, since the left value is known to be smaller than the right value:
for(k=n/2; k>0; k/=2)
for(i=k; i<n; i++)
for(j=i;j>k; j-=k)
if(false) swap
else break;
So, the algorithm collapses to this:
for(k=n/2; k>0; k/=2)
for(i=k; i<n; i++)
no_op()
The best case then becomes:
O((n - n/2) + (n - n/4) + (n - n/8) + ... + (n - 1))
= O(nlog(n) - n)
= O(nlog(n))
That said, according to Wikipedia, some other variants of Shell Sort do have an O(N) best case.
I think (at least as normally implemented) it's approximately O(n log n), though the exact number is going to depend on the progression you use.
For example, in the first iteration you invoke insertion sort, let's say, five times, each sorting every fifth element. Since each of these is linear on the number of elements sorted, you get linear complexity overall.
In the next iteration you invoke insertion sort, say, twice, sorting every other element. Again, linear overall.
In the third, you do insertion sort on every element, again linear.
In short, you have a linear algorithm invoked a (roughly) logarithmic number of times, so it should be about O(n log n) overall. That assumes some sort of geometric progression in the step sizes you use, which is common but (perhaps) not absolutely required.
If you're using log(n) compares for an array of length n, then you will have a time complexity of n log (n)
Otherwise, if you always use a constant amount of compares (such as 3), you will get O(n)
In general, if you use k gap values, your time complexity will be O(kn). People saying the best case is O(n log n) use log n gap values and people who say it's O(n) refer to always using a constant number of gap values regardless of the input.
The best case is O(n). Here's why:
Let's start with insertion sort. An already sorted list of n entries will require n minus 1 comparisons to complete (no exchanges necessary).
Put the insertion sort in the context of a shellsort with a single increment, 1. An already sorted list of n entries will require n minus the gap (1).
Suppose you have two gaps 5 followed by 1 and n is greater than 5. An already sorted list will require n-5 comparisons to process the first gap (no exchanges necessary) plus n-1 comparisons for the second or 2n-6 (no exchanges necessary).
It doesn't matter if you used n as input to generate the gaps. You end up with each gap being a constant value c (the final c being 1).
So the algorithm for the best case is "n*number of gaps - the sum of all gaps".
I don't see how "n*number of gaps - ..." could be anything other than O(n).
I know most discussions put it as something else and I get the impression that no one has bothered to sit down and do the math. As you can see, it's not rocket science.

Resources