How to generate worst case data for Graham Scan - algorithm

I know that the worse case running time of graham scan is O(nlogn) but I am not sure how to generate the worst case data. From what I understood, this occurs at the step where points are being sorted, so does that mean I should generate the worst case data for the sorting algorithm I used?
Any help would be appreciated.

Yes, as Matt notes, you need to generate a worst case for the sorting algorithm, since the rest of the algorithm runs in worst-case linear time. This sorting algorithm should be a comparison sort; otherwise, the lower bound may not be valid.
Unfortunately, without knowing the sorting algorithm, it's difficult to point to specific inputs that trigger the worst case. Some sorts, such as quicksort and mergesort, are best-case Θ(n log n). Others, like Timsort and smoothsort, have linear-time best cases. Unfortunately, given any linear-time procedure that takes a length (in unary) and returns a permutation, there's a sorting algorithm that runs in linear time on those specific permutations by checking whether the input is permuted that way and then falling back to mergesort if necessary.
The best I can do for an unspecified algorithm is to suggest that you choose a uniform random permutation, since every correct comparison sort averages Ω(n log n)-time on this input distribution.

Related

What is the appropriate data structure for insertion sort?

I revisited insertion sort algorithm and noticed something funny.
One obviously shouldn't use an array with this sort, as upon insertion, one will have to shift all subsequent elements O(n^2 log(n)). However a linked list is also not good here, since we preferably find the right placement using binary search, which isn't possible for a simple linked list (so we end up with O(n^2)).
Which makes me wonder: what is a data structure on which this sorting algorithm provides its premise of O(nlog(n)) complexity?
From where did you get the premise of O(n log n)? Wikipedia disagrees, as does my own experience. The premises of the insertion sort include components that are O(n) for each of the n elements.
Also, I believe that your claim of O(n^2 log n) is incorrect. The binary search is log n, and the ensuing "move sideways" is n, but these two steps are in succession, not nested. The result is n + log n, not a multiplication. The result is the expected O(n^2).
If you use a gapped array and a binary search to figure out where to insert things, then with high probability your sort will be O(n log(n)). See https://en.wikipedia.org/wiki/Library_sort for details.
However this is not as efficient as a wide variety of other sorts that are widely implemented. So this knowledge is only of theoretical interest.
Insertion sort is defined over array or list, if you use some other data structure, then it will be another algorithm.
Of course if you use a BST, insertion and search would be O(log(n)) and your overall complexity would be O(n.log(n)) on the average (remind that it will be O(n^2) in the worst), but this will be no more an insertion sort but a tree sort. If you use an AVL tree, then you get the O(n.log(n)) worst case complexity.
In insertion sort the best case scenario is when the sequence is already sorted and that takes Linear time and in the worst case takes O(n^2) time. I do not know how you got the logarithmic part in the complexity.

Big O algorithms minimum time

I know that for some problems, no matter what algorithm you use to solve it, there will always be a certain minimum amount of time that will be required to solve the problem. I know BigO captures the worst-case (maximum time needed), but how can you find the minimum time required as a function of n? Can we find the minimum time needed for sorting n integers, or perhaps maybe finding the minimum of n integers?
what you are looking for is called best case complexity. It is kind of useless analysis for algorithms while worst case analysis is the most important analysis and average case analysis is sometimes used in special scenario.
the best case complexity depends on the algorithms. for example in a linear search the best case is, when the searched number is at the beginning of the array. or in a binary search it is in the first dividing point. in these cases the complexity is O(1).
for a single problem, best case complexity may vary depending on the algorithm. for example lest discuss about some basic sorting algorithms.
in bubble sort best case is when the array is already sorted. but even in this case you have to check all element to be sure. so the best case here is O(n). same goes to the insertion sort
for quicksort/mergesort/heapsort the best case complexity is O(n log n)
for selection sort it is O(n^2)
So from the above case you can understand that the complexity ( whether it is best , worst or average) depends on the algorithm, not on the problem

Does every algorithm has a best case data input?

Does every algorithm has a 'best case' and 'worst case' , this was a question raised by someone who answered it with no ! I thought that every algorithm has a case depending on its input so that one algorithm finds that a particular set of input are the best case but other algorithms consider it the worst case.
so which answer is correct and if there are algorithms that doesn't have a best case can you give an example ?
Thank You :)
No, not every algorithm has a best and worst case. An example of that is the linear search to find the max/min element in an unsorted array: it always checks all items in the array no matter what. It's time complexity is therefore Theta(N) and it's independent of the particular input.
Best Case input is the casein which your code would take the least number of procedure calls. eg. You have an if in your code and in that, you iterate for every element and no such functionality in else part. So, any input in which the code does not enter if block will be the best case input and conversely, any input in which code enters this if will be worst case for this algorithm.
If, for any algorithm, branching or recursion or looping causes a difference in complexity factor for that algorithm, it will have a possible best case or possible worst case scenario. Otherwise, you can say that it does not or that it has similar complexity for best case or worst case.
Talking about sorting algorithms, lets take example of merge and quick sorts. (I believe you know them well, and their complexities for that matter).
In merge sort every time, array is divided into two equal parts thus taking log n factor in splitting while in recombining, it takes O(n) time (for every split, of course). So, total complexity is always O(n log n) and it does not depend on the input. So, you can either say merge sort has no best/worst case conditions or its complexity is same for best/worst cases.
On the other hand, if quick sort (not randomized, pivot always the 1st element) is given a random input, it will always divide the array in two parts, (may or may not be equal, doesn't matter) and if it does this, log factor of its complexity comes into picture (though base won't always be 2). But, if the input is sorted already (ascending or descending) it will always split it into 1 element + rest of array, so will take n-1 iterations to split the array, which changes its O(log n) factor to O(n) thereby changing complexity to O(n^2). So, quick sort will have best and worst cases with different time complexities.
Well, I believe every algorithm has a best and worst case though there's no guarantee that they will differ. For example, the algorithm to return the first element in an array has an O(1) best, worst and average case.
Contrived, I know, but what I'm saying is that it depends entirely on the algorithm what their best and worst cases are, but the cases will exist, even if they're the same, or unbounded at the top end.
I think its reasonable to say that most algorithms have a best and a worst case. If you think about algorithms in terms of Asymptotic Analysis you can say that a O(n) search algorithm will perform worse than a O(log n) algorithm. However if you provide the O(n) algorithm with data where the search item is early on in the data set and the O(log n) algorithm with data where the search item is in the last node to be found the O(n) will perform faster than the O(log n).
However an algorithm that has to examine each of the inputs every time such as an Average algorithm won't have a best/worst as the processing time would be the same no matter the data.
If you are unfamiliar with Asymptotic Analysis (AKA big O) I suggest you learn about it to get a better understanding of what you are asking.

Need an efficient selection algorithm?

I am looking for an algorithm for selecting A [N/4] the element in an unsorted array A where N is the Number of elements of the array A. I want the algorithm to do the selection in sublinear times .I have knowledge of basic structures like a BST etc? Which one will be the best algorithm for me keeping in mind I want it to be the fastest possible and should not be too tough for me to implement.Here N can vary upto 250000.Any help will be highly appreciated.Note array can have non unique elements
As #Jerry Coffin mentioned, you cannot hope to get a sublinear time algorithm here unless you are willing to do some preprocessing up front. If you want a linear-time algorithm for this problem, you can use the quickselect algorithm, which runs in expected O(n) time with an O(n2) worst-case. The median-of-medians algorithm has worst-case O(n) behavior, but has a high constant factor. One algorithm that you might find useful is the introselect algorithm, which combines the two previous algorithms to get a worst-case O(n) algorithm with a low constant factor. This algorithm is typically what's used to implement the std::nth_element algorithm in the C++ standard library.
If you are willing to do some preprocessing ahead of time, you can put all of the elements into an order statistic tree. From that point forward, you can look up the kth element for any k in time O(log n) worst-case. The preprocessing time required is O(n log n), though, so unless you are making repeated queries this is unlikely to be the best option.
Hope this helps!

Expected running time in randomized algorithm

In most of the calculation analysis of running times, we have assumed
that all inputs are equally likely. This is not true, because nearly
sorted input, for instance, occurs much more often than is
statistically expected, and this causes problems, particularly for
quicksort and binary search trees.
By using a randomized algorithm, the particular input is no longer
important. The random numbers are important, and we can get an
expected running time, where we now average over all possible random
numbers instead of over all possible inputs. Using quicksort with a
random pivot gives an O(n log n)-expected-time algorithm. This means
that for any input, including already-sorted input, the running time
is expected to be O(n log n), based on the statistics of random
numbers. An expected running time bound is somewhat stronger than an
average-case bound but, of course, is weaker than the corresponding
worst-case bound.
First, we will see a novel scheme for supporting the binary search
tree operations in O(log n) expected time. Once again, this means that
there are no bad inputs, just bad random numbers. From a theoretical
point of view, this is not terribly exciting, since balanced search
trees achieve this bound in the worst case. Nevertheless, the use of
randomization leads to relatively simple algorithms for searching,
inserting, and especially deleting.
My question on above text is
What does author mean by "An expected running time bound is somewhat stronger than an average-case bound but, of course, is weaker than the corresponding worst-case bound" ? in above text.
Regrading binary search trees what does author meant by "since balanced search trees achieve this bound in the worst case"? my understanding for binary search trees worst case is O(d), where d is depth of the node this can be "N" i.e., O(N). what does author mean by this is same as worst case above?
Thanks!
Like the author explained in the sentence before: An expected time must hold for any input. Average case is averaged over all inputs, that is, you get a reasonably mediocre input. Expected time means no matter how bad the input is, the algorithm must be able to compute it within the bound if the random number god is nice (i.e. gives you your expected value, and not the worst possible thing like she often does).
Balanced binary search trees. They can't reach depth N because they are balanced.
Author means that on average Quick Sort will produce slower results then O(n log n) (This is not correct for all sorting algorithms, e.g. for merge sort expected time == average time ==O(n log n) and no randomization is needed)
O(d) = O(log n) for balanced trees
PS who is the author?
In randomized quicksort,even intentionally, we cant produce a bad input(which may cause worst case)since the random permutation makes the input order irrelevant. The randomized algorithm performs badly only if the random-number generator produces an unlucky permutation to be sorted.Nearly all permutations cause quicksort to perform closer to the average case, there are very few permutations that cause near-worst-case behavior and therefore probability of worst case input is very low...so it almost performs in O(nlogn).

Resources