Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
I have question about quicksort and selection sort. I have read many posts here but none of them answers my question.
Take a look:
We have 10GB of numbers and we have to sort them. However, we have only 800mb of memory available so mergesort is out of the question. Now, because of the hugeee size of the array, bubblesort is also out of the question.
Personally, I think both sortin algorithms are great for this job, however I have to choose only one of them, the one that works better.
Quicksort: Usually has : O(N * logN) and worst: O(N^2)
Selectionsort: usually & worst : O(N^2)
Quicksort seems better, but from my experience, I think that Selectionsort is slightly better that quick sort for huge data structures. What do you think? Thank you!
selection sort is slightly better than quicksort for huge data structures! Where did you get this from? The algorithm takes quadratic time so it's obviously much worse than quicksort. Actually, how are you going to fit 10GB in RAM, you can't use any algorithm on your array if it's not in RAM. You need an external sorting algorithm or you might store the data in a DB and let the DB engine sort it for you.
Quick sort is better for such huge data than selection sort. Selection sort might perform better in cases where the test data has a larger sets of sorted data within. But that doesn't in anyway make it better than quick sort. Your main problem in your case is on how to proceed with sorting such huge data as it cannot be held in memory and executed
Quicksort should be used for this situation due to it being the current fastest sorting algorithm. Because selection sort looks through every term to find the smallest number and putting it at the front, it will take much longer (especially if there is a huge data structure as mentioned), even with a limited amount of memory.
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
Which sorting algorithm fits best for singly linked lists and double linked lists that have less than 20 items or almost sorted list? I try to understand which sorting algorithms fit for small lists I understand for arrays but do not understand how it is for linked lists.
Insertion sort works quite fast for nearly sorted arrays and will be a good option for a doubly linked list too. For small sized inputs, it doesn't really matter which algorithm you prefer since all of them will end up in constant time. Note however, advanced algorithms are a bit overkill if there are like 10-20 elements to be sorted. The overhead is big.
In linked lists, Merge sort can be performed in place, not using extra memory, since it's possible to merge nodes in a linked list in O(1) time without the use of an extra array. Quicksort however is worse. It uses lots of indexing, which is something linked structures are bad at.
Choosing a simple algorithm, Selection sort is usually never the best choice since it always performs in O(N^2) time. Bubble sort and Insertion sort have best case O(N) and the same worst case as Selection sort.
Insertion sort on the other hand does not perform well on a singly linked list since we cannot move backwards, only forward. Bubble sort works fine. For a doubly linked list, both performs well.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I'm learning about data structures and sort algorithm and I have some questions that I want to ask:
When we choose array and when we choose linked-list for sort algorithm
What sort algorithm we should use for small data and what for big data? I know that depends on the situation, we should choose available algorithm, but I can't understand the specifics.
Linked-list or array
Array is the more common choice.
Linked-list is mostly used when your data is already in a linked-list, or you need it in a linked-list for your application.
Not that I've really seen justifiable cause to use one over the other (except that most sorting algorithms are focussed around arrays). Both can be sorted in O(n log n), at least with comparison-based sorting algorithms.
When to use what
With comparison-based sorting, insertion sort is typically used for < ~10-20 elements, as it has low constant factors, even though it has O(n²) running time. For more elements, quick-sort or merge-sort (both running in O(n log n)) or some derivation of either is typically faster (although there are other O(n log n) sorting algorithms).
Insertion sort also performs well (O(n)) on nearly sorted data.
For non-comparison-based sorting, it really depends on your data. Radix sort, bucket sort and counting sort are all well-known examples, and each have their respective uses. A brief look at their running time should give you a good idea of when they should be used. Counting sort, for example, is good if the range of values to be sorted is really small.
You can see Wikipedia for a list of sorting algorithms.
Keep in mind that sorting less than like 10000 elements would be blazingly fast with any of these sorting algorithms - unless you need the absolute best performance, you could really pick whichever one you want.
To my understanding, for both questions there is no definitive answer as both depend on the context of usage. However the following points might of importance:
If the records to be sorted are large and implemented as a value type, an array might be infavourable since exchange of records involves copying of data, which might be slower than redirecting references.
Some instance size for switching sort algorithms is usually found by experimentation in a specific context; perhaps Quicksort is used for the 'large' instances, whereas Merge Sort is used for 'small' instances, where the actual best separation between 'large' and 'small' is found by trying out in the specific context.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I am presently studying sorting algorithms. I have studied that quick sort algorithm depends on the initial organization of data. If the array is sorted, quick sort becomes slower. Is there any other sort which depends on the initial organization of data?
Of course. Insertion sort will be O(n) with the descending sorted input:
define selection_sort (arr):
out = []
while not (arr.is_empty()):
x = arr.pop()
out = insert x out
return out
because each insert call will be O(1). If pop_last() is used instead of pop() then it will be fastest on the sorted ascending input (this assumes pop() and/or pop_last() are O(1) themselves).
All fast sort algorithms minimize comparison and move operations. Minimizing move operations is dependent on the initial element ordering. I'm assuming you mean initial element ordering by initial organization.
Additionally, the fastest real world algorithms exploit locality of reference which which also shows dependence on the initial ordering.
If you are only interestend in a dependency that slows or speeds up the sorting dramatically, for example bubble sort will complete in one pass on sorted data.
Finally, many sort algorithms have average time complexity O(N log N) but worst case complexity O(N^2). What this means is that there exist specific inputs (e.g. sorted or reverse sorted) for these O(N^2) algorithms that provoke the bad run time behaviour. Some quicksort versions are example of these algorithms.
If what you're asking is "should I be worried about which sorting algorithm should I pick on a case basis?", unless you're processing thousands of millions of operations, the short answer is "no". Most of the times quicksort will be just fine (quicksort with a calculated pivot, like Java's).
In general cases, quicksort is good enough.
On the other hand, If your system is always expecting the source data in a consistent initial sorted way, and you need long CPU time and power each time, then you should definitely find the right algorithm for that corner case.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
In usual circumstances, sorting arrays of ~1000s of simple items like integer or floats is sufficiently fast that the small differences between implementations just doesn't matter.
But what if you need to sort N modest sized arrays that have been generated by some similar process or simply have have some relatedness?
I leave the specifics of what of the mysterious array generator and relationships of the generated arrays intentionally vague. It is up to any applicable algorithms to specify a large as possible domain where they will work when they will be most useful.
EDIT: Let's narrow this by letting the arrays be independent samples. There exists an unchanging probability distribution of arrays that will be generated. Implicitly then there's a stable probability distribution of elements in the arrays but it's conditonal -- the elements within an array might not be independent. It seems like it'd be extremely hard to make use of relationships between elements within the arrays but I could be wrong. We can narrow further if needed by letting the elements in the arrays be independent. In that case the problem is to effectively learn and use the probability distribution of elements in the arrays.
Here is a paper on a self improving sorting algorithm. I am pretty strong with algorithms and machine learning, but this paper is definitely not an easy read for me.
The abstract says this
We investigate ways in which an algorithm can improve
its expected performance by fine-tuning itself automatically with respect to an arbitrary, unknown input distribution. We give such self-improving algorithms for
sorting and clustering. The highlights of this work:
a sorting algorithm with optimal expected limiting running time ...
In all cases, the algorithm begins with a learning phase
during which it adjusts itself to the input distribution
(typically in a logarithmic number of rounds), followed
by a stationary regime in which the algorithm settles to
its optimized incarnation.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
Lets say we want to find some known key at array and extract the value. There are 2 possible approaches(maybe more?) to do it. Linear approach, during which we will compare each array key with needle O(N). Or we can sort this array O(N*log(N)) and apply binary search O(log(N)). And I have several questions about it.
So, as I can see sort is closely related to search but stand alone sort is useless. Sorting is an instrument to simplify search. Am I correct? Or there any other implementations of sorting?
If we will talk about search, than we can do search on unsorted data O(N) and sorted O(N*log(N)) + O(log(N)). Searching can exist separately from sorting. In case when we need to find something at array only once we should use linear search, if the search is repeated we should sort the data and after it perform searching?
Don't think before every search a O(n * lg(n)) sort is needed. That would be ridiculous because O(n * lg(n)) + O(log(n)) > O(n) that is it would be quicker to do a linear search on random order data which on average would be O(n/2).
The idea is to initially sort your random data only once using a O(n * lg(n)) algorithm then any data added prior to sorting should be added in order so every search there after can be done in O(lg(n)) time.
You might be interesting in looking at hash tables which are a kind of array that are unsorted but have O(1) constant access time.
It is extremely rare that you would create an array of N items then search it only once. Therefore it is usually profitable to improve the data structure holding the items to improve sort time (amortize the set up time over all the searches and see if you save over-all time)
However there are many other considerations: Do you need to add new items to the collection? Do you need to remove items from the collection? Are you willing to spend extra memory in order to improve sort time? Do you care about the original order in which the items were added to the collection? All of these factors, and more, influence your choice of container and searching technique.