It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 11 years ago.
Just a simple algorithm to sort small integers, but it must be O(n).
A radix sort is one approach that's O(n). Since you're dealing with small integers, it shouldn't be too hard to implement.
Of course the fine print in the definition of O(n) there gets you. The radix sort, eg, is really n*log(n) when you figure that you must create a deeper tree as you accommodate more values -- they just manage to define it as O(n) by the trick of capping the number of values to be sorted. There's no way to really beat n*log(n) in the general sense.
Eg, for 8-bit values I can easily achieve O(n) by simply having a 256-entry array. But if I go to, say, even 32-bit values then I must have an array with 4G entries, and the address decoder for the memory chip for that array will have grown with log(n) of the size of the memory chip. Yes, I can say that the version with 4G entries is O(n), but at a electronic level the addressing is log(n) slower and more complex. Additionally, the buses inside the chip must drive more current and it will take longer for a memory cell, once "read", to dump its contents onto the bus. And all those effects are log(n).
Simply put :
If you have no prior information on your number you're sorting, you cannot do better than O(nlogn) in average
If you have more information (like the fact that you're dealing with integers), you can have some O(n) algorithms
A great resource are these Wikipedia tables. Have a look at the second one.
To the best of my knowledge, comparison based sorting algorithms share a lower bound of O(nlogn).
To achieve O(n), we probably can't use any comparison based algorithms. Also, the input must bear additional properties.
In your example, small integers, I guess, means that the integers fall within a specified range.
If that were the case, you could try bucket/radix sort algorithm, which does not require any comparisons.
For a simple example, suppose you have n integers to be sorted, all of which belong to the interval [1, 1000]. You just make 1000 buckets, and go over the n integers, if the integer is equal to 500, it goes to bucket 500, etc. Finally you concatenate all the buckets to obtain the sorted list. This algorithm takes O(n).
The optimum for comparison based sort is O(n*log(n)), the proof is not very difficult. BUT you may use counting sort, which is enumeration based or very similar bucket sort... You may also use radix sort, but it is not sort itself. Radix sort only iteratively calls some other stable sort...
Related
Suppose we have 1 million entries of an object 'Person' with two fields 'Name', 'Age'. The problem was to sort the entries based on the 'Age' of the person.
I was asked this question in an interview. I answered that we could use an array to store the objects and use quick sort as that would save us from using additional space but interviewer told that memory was not a factor.
My question is what would be the factor that would decide which sort to use?
Also what would be the preferred way to store this?
In this scenario does any sorting algorithm have an advantage over another sorting algorithm and would result in a better complexity?
This Stackoverflow link may be useful to you.
The answers above are sufficient but i would like to add some more information from the link above.
I am copying some information from the answers in, the link above, over here.
We should note that even if the fields in the Object are very big (i.e. long names) you do not need to use a file system sort, you can use an in-memory sort, because
# elements * 8 ~= 762 MB (most modern systems have enough memory for that)
^
key(age) + pointer to struct requires 8 bytes in 32 bits system
It is important to minimize the disk accesses - because disks are not random access, and disk accesses are MUCH slower then RAM accesses.
Now, use a sort of your choice on that - and avoid using disk for the sorting process.
Some possibilities of sorts (on RAM) for this case are:
Standard quicksort or merge-sort (Which you had already thought of)
Bucket sort can also be applied here, since the rage is limited to [0,150] (Which others have specified here under the name Count Sort)
Radix sort (For the same reason, radix sort will need ceil(log_2(150)) ~= 8 iterations
I wanted to point out the memory aspect in case you may encounter the same question but may need to answer it taking the memory constraints into consideration. In fact your constraints are even less(10^6 compared to the 10^8 in the other question).
As for the matter of storing it -
The quickest way to sort it would be to allocate 151 linked lists/vector (let's call them buckets or whatever you may depending on the language you prefer) and put each person's data structure in the bucket according to his/her age(all people's ages are between 0 and 150):
bucket[person->age].add(person)
As others have pointed out Bucket Sort is going to be the better option for you.
In fact the beauty of bucket sort is that if you have to perform any operation on ranges of ages(like from 10-50 years of age) you can partition your bucket sizes according to your requirements(like have varied bucket range for each bucket).
I repeat again i have copied the information from the answers in the link given above, but i believe they might be useful to you.
If the array has n elements, then quicksort (or, actually, any comparison-based sort) is Ω(n log(n)).
Here, though, it looks like you have here an alternative to comparison-based sorting, since you need to sort only on age. Suppose there are m distinct ages. In this case, Counting Sort, will be Θ(m + n). For the specifics of your question, assuming that age is in years, m is much smaller than n, and you can do this in linear time.
The implementation is trivial. Simply create an array of, say, 200 entries (200 being an upper bound on the age). The array is of linked lists. Scan over the people, and place each person in the linked list in the appropriate entry. Now, just concatenate the lists according to the positions in the array.
Different sorting algorithms perform at different complexities, yes. Some use different amounts of space. And in practice, real performance with the same complexity varies too.
http://www.cprogramming.com/tutorial/computersciencetheory/sortcomp.html
There're different ways to set up a quicksort's partition method that could have an effect for ages. Shell sorts can have different gap settings that perform better for certain types of input. But maybe your interviewer was more interested in you thinking about 1 million people having a lot of duplicate ages; which might mean you want a 3-way quicksort, or as suggested in comments a counting sort.
This is an interview question, so I guess interviewee's answer is more important than correct sorting algorithm. Your problem is sorting array of Object with field age is integer. Age has some special properties:
integer: there are some sorting algorithms specially design for integer.
finite: you know maximum age of people, right? For example that will be 200.
I will list some sorting algorithm for this problem with advantages and disadvantages that suitable enough in one interview session:
Quick sort: complexity is O(NLogN) and can apply to any data set. Quicksort is the fastest sort that using compare operator between two elements. Biggest disadvantage of quicksort is quicksort isn't stable. That means two objects equal in age doesn't maintain order after sorting.
Merge sort: complexity is O(NLogN). Little bit slower than quicksort but this is a stable sort. Also this algorithm can apply to any data set.
radix sort: complexity is O(w*n), with n is size of your list and w is maximum length of number of digits in your dataset. For example: length of 12 is 3, length of 154 is 3. So if people's age maximum is 99, complexity should be O(2*n). This algorithm just can apply to integer or string.
Counting sort complexity is O(m+n). With n is size of your list and m is number of distinct ages. This algorithm just can apply to integer.
Because we are sorting milion of entries and all values are integer stand in range 0 .. 200 so ton of duplicate values. So counting sort is the best fit with complexity O(200 + N), with N ~= 1,000,000. 200 is not much.
If you assume that you have finite number of different values of age (usually people are not older then 100) then you could use
counting sort (https://en.wikipedia.org/wiki/Counting_sort). You would be able to sort in linear time.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
In (most of the) research papers on sorting, authors conclude that their algorithm takes n-1comparisons to sort a 'n' sized array (where n is size of the array)
...so and so
but when it comes to coding, the code uses more comparisons than concluded.
More specifically, what assumptions do they take for the comparisons?
What kind of comparisons they don't take into account?
Like if you take a look at freezing sort or Enhanced Insertion sort. The no. Of comparisons, these algo take in actual code is more than they have specied in the graph(no. of comparisons vs no. of elements)
The least possible number of comparisons done in a sorting algorithm could be n-1. In this case, you wouldn't actually be sorted at all, you'd just be checking whether the data is already sorted, essentially just comparing each element to the ones directly before and after it (this is done in the best case for insertion sort). It's fairly easy to see that it's impossible to do less comparisons than this, because then you'd have more than one disjoint sets of what you've compared, meaning you wouldn't know how the elements across these sets compare to each other.
If we're talking about average / worst case, it's actually been proven that the number of comparisons required is Ω(n log n).
An algorithm being recursive or iterative doesn't (directly) affect the number of comparisons. The only statement I could think that we could make specifically about recursive sorting algorithms is perhaps the recursion depth. This greatly depends on the algorithm, but quick-sort, specifically, has a (worst-case) recursion depth around n-1.
More comparisons that are often ignored on papers, but are conducted
in real code are the comparisons for branches. (if (<stop clause>)
return ...;), and similarly for loop iterators.
One reason why they are mostly ignored is because they are done on
indices, which are of constant sizes, while the compared elements
(which we do count) - might take more time, depending on the actual
type being compared (strings might take longer to compare than
integers, for example).
Also note, an array cannot be sorted using n-1 comparisons
(worst/average case), since sorting is Omega(nlogn) problem.
However, it is possible what the authour meant is the sorting takes
n-1 comparisons at each step of the algorithm, and there could be
multiple (typically O(logn)) of those steps.
It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 9 years ago.
I know this might be a silly question but suppose an algorithm sorts 100,000 integers in 5 seconds. Will the same algorithm sort 100,000 strings in 5 seconds as well or will their sorting times be different?
Comparing strings is more expensive than comparing integers. The integer comparison is an operation that usually happens in the CPU. On the other hand, a comparison of strings needs to be implemented in software. Hence the comparison of strings can take as many operations as characters there are in a string.
--dmg
Q: Do algorithms sort integers and strings with the same time consistency?
A: If you mean "asymptotic"/"Big O notation": Yes. Strings will obviously take longer ... but proportionately longer.
The simple answer is no. Sorting strings will take far longer, usually, than sorting integers.
Consider sorting "paper" and "papa" -- you'd need to find that "paper" was greater than "papa" by comparing the first 4 characters. This can get even longer with larger strings, especially phrases.
In contrast, an integer comparison will take a single instruction.
To do sorting, you need to do these kind of comparisons repeatedly; so sorting strings is almost always going to take much longer.
The string will take more time.The integers are compared as a set of bitwise operations.When it comes to the string,the each byte or character is to be compared with equal operations as needed for a integer.
In effect, sorting a set of strings will involve comparing the total number of characters in all strings,in the worst case. The worst case is that all strings are same.Every string must be traversed till the end to make sure it is equal(strcmp() compares this way)
Suppose you are sorting them by dictionary order, and with quick sort. You write your own comparison function to compare your strings and integers.
Integers are quite simple to compare, its value is its index while with string, you need to compare by its content. We do know that quick sort runs at an average time complexity of O(nlogn), with worst being O(n^2), and that happens when the pivot chosen is the minimum or maximum of the current partition. We'll assume it all goes well, and we got average case for now.
The variable to time complexity is your comparison function, the integer one is quite fast O(1), while comparing strings comes down to worst case of O(min(len(a), len(b)), best case of O(1), when the first character is different.
So yes, it's slower, but not that bad considering they're strings, also worst-case are those that rarely happens. (Like what Barney Goven mentioned, comparing two similar strings. )
When talking about time complexity of algorithm, we're generally talking about its growth when data set get bigger. So like what paulsm4 said, it's proportionately longer.
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
What data-structure should I use to create my own “BigInteger” class?
Out of pure interest, I am trying to design a type that can hold an arbitrarily large integer. I want to support four basic operations [+, -, *, /] and optimise for speed of those operations.
I was thinking about some sort of a doubly-linked list and a bit flag to indicate positive or negative value. But I am not really sure how to add, for example, to large numbers of different sizes. Shall I walk to the last element of both numbers and then return back (using the second reverse pointer to the previous element).
123456789 //one large number
+ 123 //another large number with different size
Providing I can have an arbitrarily large memory, what is the best data structure for this task?
I would appreciate a small hint and any comments on worst-case complexity of the arithmetic operations. Thanks!
Usually one would go for an array/vector in this case, perhaps little-endian (lowest-significant word first). If you implement in-place operations, use a constant factor when growing the array, then amortized complexity for the reallocation remains O(1).
All operations should be doable in O(n) run time where n is the size of the input. EDIT: No, of course, multiplication and division will need more, this answer says it's at least O(N log N).
Just out of curiosity: Why are you reimplementing the wheel? Find here an implementation in Java. C# has one with .NET 4.0, too. While it might be a good exercise to implement this yourself (I remember myself doing it once), if you just need the functionality then it's there in many computing environments already.
Is there a sort of an array that works in O(n*log(n)) worst case time complexity?
I saw in Wikipedia that there are sorts like that, but they are unstable, what does that mean? Is there a way to do in low space complexity?
Is there a best sorting algorithm?
An algorithm that requires only O(1) extra memory (so modifying the input array is permitted) is generally described as "in-place", and that's the lowest space complexity there is.
A sort is described as "stable" or not, according to what happens when there are two elements in the input which compare as equal, but are somehow distinguishable. For example, suppose you have a bunch of records with an integer field and a string field, and you sort them on the integer field. The question is, if two records have the same integer value but different string values, then will the one that came first in the input, also come first in the output, or is it possible that they will be reversed? A stable sort is one that guarantees to preserve the order of elements that compare the same, but aren't identical.
It is difficult to make a comparison sort that is in-place, and stable, and achieves O(n log n) worst-case time complexity. I've a vague idea that it's unknown whether or not it's possible, but I don't keep up to date on it.
Last time someone asked about the subject, I found a couple of relevant papers, although that question wasn't identical to this question:
How to sort in-place using the merge sort algorithm?
As far as a "best" sort is concerned - some sorting strategies take advantage of the fact that on the whole, taken across a large number of applications, computers spend a lot of time sorting data that isn't randomly shuffled, it has some structure to it. Timsort is an algorithm to take advantage of commonly-encountered structure. It performs very well in a lot of practical applications. You can't describe it as a "best" sort, since it's a heuristic that appears to do well in practice, rather than being a strict improvement on previous algorithms. But it's the "best" known overall in the opinion of people who ship it as their default sort (Python, Java 7, Android). You probably wouldn't describe it as "low space complexity", though, it's no better than a standard merge sort.
You can check out between mergesort, quicksort or heapsort all nicely described here.
There is also radix sort whose complexity is O(kN) but it takes full advantage of extra memory consumption.
You can also see that for smaller collections quicksort is faster but then mergesort takes the lead but all of this is case specific so take your time to study all 4 algorithms
For the question best algorithm, the simple answer is, it depends.It depends on the size of the data set you want to sort,it depends on your requirement.Say, Bubble sort has worst-case and average complexity both О(n2), where n is the number of items being sorted. There exist many sorting algorithms with substantially better worst-case or average complexity of O(n log n). Even other О(n2) sorting algorithms, such as insertion sort, tend to have better performance than bubble sort. Therefore, bubble sort is not a practical sorting algorithm when n is large.
Among simple average-case Θ(n2) algorithms, selection sort almost always outperforms bubble sort, but is generally outperformed by insertion sort.
selection sort is greatly outperformed on larger arrays by Θ(n log n) divide-and-conquer algorithms such as mergesort. However, insertion sort or selection sort are both typically faster for small arrays.
Likewise, you can yourself select the best sorting algorithm according to your requirements.
It is proven that O(n log n) is the lower bound for sorting generic items. It is also proven that O(n) is the lower bound for sorting integers (you need at least to read the input :) ).
The specific instance of the problem will determine what is the best algorithm for your needs, ie. sorting 1M strings is different from sorting 2M 7-bits integers in 2MB of RAM.
Also consider that besides the asymptotic runtime complexity, the implementation is making a lot of difference, as well as the amount of available memory and caching policy.
I could implement quicksort in 1 line in python, roughly keeping O(n log n) complexity (with some caveat about the pivot), but Big-Oh notation says nothing about the constant terms, which are relevant too (ie. this is ~30x slower than python built-in sort, which is likely written in C btw):
qsort = lambda a: [] if not a else qsort(filter(lambda x: x<a[len(a)/2], a)) + filter(lambda x: x == a[len(a)/2], a) + qsort(filter(lambda x: x>a[len(a)/2], a))
For a discussion about stable/unstable sorting, look here http://www.developerfusion.com/article/3824/a-guide-to-sorting/6/.
You may want to get yourself a good algorithm book (ie. Cormen, or Skiena).
Heapsort, maybe randomized quicksort
stable sort
as others already mentioned: no there isn't. For example you might want to parallelize your sorting algorithm. This leads to totally different sorting algorithms..
Regarding your question meaning stable, let's consider the following: We have a class of children associated with ages:
Phil, 10
Hans, 10
Eva, 9
Anna, 9
Emil, 8
Jonas, 10
Now, we want to sort the children in order of ascending age (and nothing else). Then, we see that Phil, Hans and Jonas all have age 10, so it is not clear in which order we have to order them since we sort just by age.
Now comes stability: If we sort stable we sort Phil, Hans and Jonas in the order they were before, i.e. we put Phil first, then Hans, and at last, Jonas (simply because they were in this order in the original sequence and we only consider age as comparison criterion). Similarily, we have to put Eva before Anna (both the same age, but in the original sequence Eva was before Anna).
So, the result is:
Emil, 8
Eva, 9
Anna, 9
Phil, 10 \
Hans, 10 | all aged 10, and left in original order.
Jonas, 10 /
To put it in a nutshell: Stability means that if two elements are equal (w.r.t. the chosen sorting criterion), the one coming first in the original sequence still comes first in the resulting sequence.
Note that you can easily transform any sorting algorithm into a stable sorting algorithm: If your original sequence holds n elements: e1, e2, e3, ..., en, you simply attach a counter to each one: (e1, 0), (e2, 1), (e3, 2), ..., (en, n-1). This means you store for each element its original position.
If now two elements are equal, you simply compare their counters and put the one with the lower counter value first. This increases runtime (and memory) by O(n), which is asymptotic no worsening since the best (comparison) sort algorithm needs already O(n lg n).