what is the best case and average case (time complexity) in the case of Finding the biggest element on array?
i have confusing, because i compared this algorithm with sequential search and the worst case are same (O(n)) but i guess the best case and average case are different because in algorithm of finding biggest value in array we must compared all element and at algorithm of sequential search it will stop if the values was found. that means the best case is different if i use the same array in that two program but i have the biggest value at first element? but how it can be happen when we have the same worst case?
For the sequential search, the best case is 1 comparison and the worst n comparisons (for a uniform distribution, the expected case is (n+1)/2). In fact the number of comparisons equals the 1-based index of the searched element (n if absent).
For the search of the maximum, n-1 comparisons in all cases as you must look at all values.
If the list is sorted increasingly, a binary search finds an element in O(log n) comparisons. And returning the maximum takes 0 comparisons as it is known to be the last element.
Related
I know that Binary Search has time complexity of O(logn) to search for an element in a sorted array. But let's say if instead of selecting the middle element, we select a random element, how would it impact the time complexity. Will it still be O(logn) or will it be something else?
For example :
A traditional binary search in an array of size 18 , will go down like 18 -> 9 -> 4 ...
My modified binary search pings a random element and decides to remove the right part or left part based on the value.
My attempt:
let C(N) be the average number of comparisons required by a search among N elements. For simplicity, we assume that the algorithm only terminates when there is a single element left (no early termination on strict equality with the key).
As the pivot value is chosen at random, the probabilities of the remaining sizes are uniform and we can write the recurrence
C(N) = 1 + 1/N.Sum(1<=i<=N:C(i))
Then
N.C(N) - (N-1).C(N-1) = 1 + C(N)
and
C(N) - C(N-1) = 1 / (N-1)
The solution of this recurrence is the Harmonic series, hence the behavior is indeed logarithmic.
C(N) ~ Ln(N-1) + Gamma
Note that this is the natural logarithm, which is better than the base 2 logarithm by a factor 1.44 !
My bet is that adding the early termination test would further improve the log basis (and keep the log behavior), but at the same time double the number of comparisons, so that globally it would be worse in terms of comparisons.
Let us assume we have a tree of size 18. The number I am looking for is in the 1st spot. In the worst case, I always randomly pick the highest number, (18->17->16...). Effectively only eliminating one element in every iteration. So it become a linear search: O(n) time
The recursion in the answer of #Yves Daoust relies on the assumption that the target element is located either at the beginning or the end of the array. In general, where the element lies in the array changes after each recursive call making it difficult to write and solve the recursion. Here is another solution that proves O(log n) bound on the expected number of recursive calls.
Let T be the (random) number of elements checked by the randomized version of binary search. We can write T=sum I{element i is checked} where we sum over i from 1 to n and I{element i is checked} is an indicator variable. Our goal is to asymptotically bound E[T]=sum Pr{element i is checked}. For the algorithm to check element i it must be the case that this element is selected uniformly at random from the array of size at least |j-i|+1 where j is the index of the element that we are searching for. This is because arrays of smaller size simply won't contain the element under index i while the element under index j is always contained in the array during each recursive call. Thus, the probability that the algorithm checks the element at index i is at most 1/(|j-i|+1). In fact, with a bit more effort one can show that this probability is exactly equal to 1/(|j-i|+1). Thus, we have
E[T]=sum Pr{element i is checked} <= sum_i 1/(|j-i|+1)=O(log n),
where the last equation follows from the summation of harmonic series.
If I have a unsorted array A[1.....n]
using linear search to search number x
using bubble sorting to sort the array A in ascending order, then use binary search to search number x in sorted array
Which way will be more efficient — 1 or 2?
How to justify it?
If you need to search for a single number, nothing can beat a linear search: sorting cannot proceed faster than O(n), and even that is achievable only in special cases. Moreover, bubble sort is extremely inefficient, taking O(n2) time. Binary search is faster than that, so the overall timing is going to be dominated by O(n2).
Hence you are comparing O(n) to O(n2); obviously, O(n) wins.
The picture would be different if you needed to search for k different numbers, where k is larger than n2. The outcome of this comparison may very well be negative.
This is a practice exam question i'm working on, i have a general idea of what the answer is but would like some clarification.
The
following
is
a
sorting
algorithm
for
n
integers
in
an
array.
In
step
1,
you
iterate
through
the
array,
and
compare
each
pair
of
adjacent
integers
and
swap
each
pair
if
they
are
in
the
wrong
order.
In
step
2,
you
repeat
step
1
as
many
times
as
necessary
until
there
is
an
iteration
where
no
swaps
are
made
(in
which
case
the
list
is
sorted
and
you
can
stop).
What
is
the
worst case
complexity
of
this algorithm?
What is the best case complexity of this algorithm?
Basically the algorithm presented here is a bubble sort.
The worst case complexity here is O(n^2).
The best case complexity is O(n).
Here is the explanation:
The best case situation here would be "Already sorted array". so all you need is N comparisions(To be precise its n-1) so the complexity is O(n).
The worst case situation is reverse ordered array.
To better understand why its O(n^2), consider just first element of reverse ordered array which indeed is a largest element, to make this array sorted you need to get that element to the last index of the array. Through the algorithm explained in the question, on each iteration it takes the largest element one index towards its actual position(last index here) and it requires O(n) comparisions to move one posistion. and hence O(n^2) comparision to move it to its actual position.
In the best case, no swapping will be required and a single pass of the array would suffice. So the complexity is O(n).
In the worst case, the elements of the array could be in the reverse order. So the first iteration requires (n-1) swaps, the next one (n-2) and do on...
So it would lead to O(n^2) complexity.
As others have said, this is bubble sort. But if you are measuring complexity in terms of comparisons, you can easily be more precise than big-O.
In the best case, you need only compare n-1 pairs to verify they're all in the right order.
In the worst case, the first element is the one that should be in the last position, so n-1 passes will be needed, each advancing that element one more position toward the end of the list. Each pass requires n-1 comparisons. In all, then, (n-1)^2 comparisons are needed.
First, I know
lower bound is O(nlogn)
and how to prove it
And I agree the lower bound should be O(nlogn).
What I don't quite understand is:
For some special cases, the # of comparisons could actually be even lower than the lower bound. For example, use bubble sort to sort an already sorted array. The # of comparisons is O(n).
So how to actually understand the idea of lower bound?
The classical definition on Wikipedial: http://en.wikipedia.org/wiki/Upper_and_lower_bounds does not help much.
My current understanding of this is:
lower bound of the comparison-based sorting is actually the upper bound for the worst case.
namely, how best you could in the worst case.
Is this correct? Thanks.
lower bound of the comparison-based sorting is actually the upper bound for the best case.
No.
The function that you are bounding is the worst-case running time of the best possible sorting algorithm.
Imagine the following game:
We choose some number n.
You pick your favorite sorting algorithm.
After looking at your algorithm, I pick some input sequence of length n.
We run your algorithm on my input, and you give me a dollar for every executed instruction.
The O(n log n) upper bound means you can limit your cost to at most O(n log n) dollars, no matter what input sequence I choose.
The Ω(n log n) lower bound means that I can force you to pay at least Ω(n log n) dollars, no matter what sorting algorithm you choose.
Also: "The lower bound is O(n log n)" doesn't make any sense. O(f(n)) means "at most a constant times f(n)". But "lower bound" means "at least ...". So saying "a lower bound of O(n log n)" is exactly like saying "You can save up to 50% or more!" — it's completely meaningless! The correct notation for lower bounds is Ω(...).
The problem of sorting can be viewed as following.
Input: A sequence of n numbers .
Output: A permutation (reordering) of the input sequence such that a‘1 <= a‘2 ….. <= a‘n.
A sorting algorithm is comparison based if it uses comparison operators to find the order between two numbers. Comparison sorts can be viewed abstractly in terms of decision trees. A decision tree is a full binary tree that represents the comparisons between elements that are performed by a particular sorting algorithm operating on an input of a given size. The execution of the sorting algorithm corresponds to tracing a path from the root of the decision tree to a leaf. At each internal node, a comparison ai aj is made. The left subtree then dictates subsequent comparisons for ai aj, and the right subtree dictates subsequent comparisons for ai > aj. When we come to a leaf, the sorting algorithm has established the ordering. So we can say following about the decison tree.
1) Each of the n! permutations on n elements must appear as one of the leaves of the decision tree for the sorting algorithm to sort properly.
2) Let x be the maximum number of comparisons in a sorting algorithm. The maximum height of the decison tree would be x. A tree with maximum height x has at most 2^x leaves.
After combining the above two facts, we get following relation.
n! <= 2^x
Taking Log on both sides.
\log_2n! <= x
Since \log_2n! = \Theta(nLogn), we can say
x = \Omega(nLog_2n)
Therefore, any comparison based sorting algorithm must make at least \Omega(nLog_2n) comparisons to sort the input array, and Heapsort and merge sort are asymptotically optimal comparison sorts.
When you do asymptotic analysis you derive an O or Θ or Ω for all input.
But you can also make analysis on whether properties of the input affect the runtime.
For example algorithms that take as input something almost sorted have better performance than the formal asymptotic formula due to the input characteristics and the structure of the algorithm. Examples are bubblesort and quicksort.
It is not that you can go bellow the lower boundaries. It only behavior of the implementation on specific input.
Imagine all the possible arrays of things that could be sorted. Lets say they are arrays of length 'n' and ignore stuff like arrays with one element (which, of course, are always already sorted.
Imagine a long list of all possible value combinations for that array. Notice that we can simplify this a bit since the values in the array always have some sort of ordering. So if we replace the smallest one with the number 1, the next one with 1 or 2 (depending on whether its equal or greater) and so forth, we end up with the same sorting problem as if we allowed any value at all. (This means an array of length n will need, at most, the numbers 1-n. Maybe less if some are equal.)
Then put a number beside each one telling how much work it takes to sort that array with those values in it. You could put several numbers. For example, you could put the number of comparisons it takes. Or you could put the number of element moves or swaps it takes. Whatever number you put there indicates how many operations it takes. You could put the sum of them.
One thing you have to do is ignore any special information. For example, you can't know ahead of time that the arrangement of values in the array are already sorted. Your algorithm has to do the same steps with that array as with any other. (But the first step could be to check if its sorted. Usually that doesn't help in sorting, though.)
So. The largest number, measured by comparisons, is the typical number of comparisons when the values are arranged in a pathologically bad way. The smallest number, similarly, is the number of comparisons needed when the values are arranged in a really good way.
For a bubble sort, the best case (shortest or fastest) is if the values are in order already. But that's only if you use a flag to tell whether you swapped any values. In that best case, you look at each adjacent pair of elements one time and find they are already sorted and when you get to the end, you find you haven't swapped anything so you are done. that's n-1 comparisons total and forms the lowest number of comparisons you could ever do.
It would take me a while to figure out the worst case. I haven't looked at a bubble sort in decades. But I would guess its a case where they are reverse ordered. You do the 1st comparison and find the 1st element needs to move. You slide up to the top comparing to each one and finally swap it with the last element. So you did n-1 comparisons in that pass. The 2nd pass starts at the 2nd element and does n-2 comparisons and so forth. So you do (n-1)+(n-2)+(n-3)+...+1 comparisons in this case which is about (n**2)/2.
Maybe your variation on bubble sort is better than the one I described. No matter.
For bubble sort then, the lower bound is n-1 and the upper bound is (n**2)/2
Other sort algorithms have better performance.
You might want to remember that there are other operations that cost besides comparisons. We use comparisons because much sorting is done with strings and a string comparison is costly in compute time.
You could use element swaps to count (or the sum of swaps and elements swaps) but they are typically shorter than comparisons with strings. If you have numbers, they are similar.
You could also use more esoteric things like branch prediction failure or memory cache misses or for measuring.
this is a homework question, and I'm not that at finding the complixity but I'm trying my best!
Three-way partitioning is a modification of quicksort that partitions elements into groups smaller than, equal to, and larger than the pivot. Only the groups of smaller and larger elements need to be recursively sorted. Show that if there are N items but only k unique values (in other words there are many duplicates), then the running time of this modification to quicksort is O(Nk).
my try:
on the average case:
the tree subroutines will be at these indices:
I assume that the subroutine that have duplicated items will equal (n-k)
first: from 0 - to(i-1)
Second: i - (i+(n-k-1))
third: (i+n-k) - (n-1)
number of comparisons = (n-k)-1
So,
T(n) = (n-k)-1 + Sigma from 0 until (n-k-1) [ T(i) + T (i-k)]
then I'm not sure how I'm gonna continue :S
It might be a very bad start though :$
Hope to find a help
First of all, you shouldn't look at the average case since the upper bound of O(nk) can be proved for the worst case, which is a stronger statement.
You should look at the maximum possible depth of recursion. In normal quicksort, the maximum depth is n. For each level, the total number of operations done is O(n), which gives O(n^2) total in the worst case.
Here, it's not hard to prove that the maximum possible depth is k (since one unique value will be removed at each level), which leads to O(nk) total.
I don't have a formal education in complexity. But if you think about it as a mathematical problem, you can prove it as a mathematical proof.
For all sorting algorithms, the best case scenario will always be O(n) for n elements because to sort n elements you have to consider each one atleast once. Now, for your particular optimisation of quicksort, what you have done is simplified the issue because now, you are only sorting unique values: All the values that are the same as the pivot are already considered sorted, and by virtue of its nature, quicksort will guarantee that every unique value will feature as the pivot at some point in the operation, so this eliminates duplicates.
This means for an N size list, quicksort must perform some operation N times (once for every position in the list), and because it is trying to sort the list, that operation is trying to find the position of that value in the list, but because you are effectively dealing with just unique values, and there are k of those, the quicksort algorithm must perform k comparisons for each element. So it performs Nk operations for an N sized list with k unique elements.
To summarise:
This algorithm eliminates checking against duplicate values.
But all sorting algorithms must look at every value in the list at least once. N operations
For every value in the list the operation is to find its position relative to other values in the list.
Because duplicates get removed, this leaves only k values to check against.
O(Nk)