thenComparing vs sort - java-8

Are there any differences (e.g. performance, ordering) of the two versions:
version 1:
mylist.sort(myComparator.sort_item);
mylist.sort(myComparator.sort_post);
version 2:
// java 8
mylist.sort(myComparator.sort_item
.thenComparing(myComparator.sort_post));

Version 1: You are sorting by item, then throwing that sort away to sort by post instead. Effectively, the first sort is meaningless.
Version 2: You are sorting first by item, and in the event of a tie, breaking that tie using post.

From the Java 8 API documentation:
[thenComparing] Returns a lexicographic-order comparator with another
comparator. If this Comparator considers two elements equal, i.e.
compare(a, b) == 0, other is used to determine the order.
That means the second comparator is only used if the first one returns 0 (the elements are equal). So in practice it should be faster in most cases then calling sort twice.
In theory, if the sorting algorithm is of time complexity C, then calling it twice will still be C (constant multiplication doesn't matter) to the complexity of both sorting methods is the same.

Related

What are the stabilty "factors" of sorting?

Lately, I have been learning about various methods of sorting and a lot of them are unstable i.e selection sort, quick sort, heap sort.
My question is: What are the general factors that make sorting unstable?
Most of the efficient sorting algorithms are efficient since they move data over a longer distance i.e. far closer to the final position every move. This efficiency causes the loss of stability in sorting.
For example, when you do a simple sort like bubble sort, you compare and swap neighboring elements. In this case, it is easy to not move the elements if they are already in the correct order. But say in the case of quick-sort, the partitioning process might chose to say move so the swaps are minimal. For example, if you partition the below list on the number 2, the most efficient way would be to swap the 1st element with the 4th element and 2nd element with the 5th element
2 3 1 1 1 4
1 1 1 2 3 4
If you notice, now we have changed the sequence of 1's in the list causing it to be unstable.
So to sum it up, some algorithms are very suitable for stable sorting (like bubble-sort), whereas some others like quick sort can be made stable by carefully selecting a partitioning algorithm, albeit at the cost of efficiency or complexity or both.
We usually classify the algorithm to be stable or not based on the most "natural" implementation of it.
A sorting algorithm is stable when it uses the original order of elements to break ties in the new ordering. For example, lets say you have records of (name, age) and you want to sort them by age.
If you use a stable sort on (Matt, 50), (Bob, 20), (Alice, 50), then you will get (Bob, 20), (Matt, 50), (Alice, 50). The Matt and Alice records have equal ages, so they are equal according to the sorting criteria. The stable sort preserves their original relative order -- Matt came before Alice in the original list, so it comes before Alice in the output.
If you use an unstable sort on the same list, you might get (Bob, 20), (Matt, 50), (Alice, 50) or you might get (Bob, 20), (Alice, 50), (Matt, 50). Elements that compare equal will be grouped together but can come out in any order.
It's often handy to have a stable sort, but a stable sort implementation has to remember information about the original order of the elements while its reordering them.
In-place array sorting algorithms are designed not to use any extra space to store this kind of information, and they destroy the original ordering while they work. The fast ones like quicksort aren't usually stable, because reordering the array in ways that preserve the original order to break ties is slow. Slow array sorting algorithms like insertion sort or selection sort can usually be written to be stable without difficulty.
Sorting algorithms that copy data from one place to another, or work with other data structures like linked lists, can be both fast and stable. Merge sort is the most common.
If you have an example input of
1 5 3 7 1
You want for to be stable the last 1 to never go before the first 1.
Generally, elements with the same value in the input array to not have changed their positions once sorted.
Then sorted would look like:
1(f) 3 5 7 1(l)
f: first, l: last(or second if more than 2).
For example, QuickSort uses swaps and because the comparisons are done with greater than (>=) or less than, equally valued elements can be swapped while sorting. And as result in the output.

sorting component-wise multi value (SIMD) array

I'm trying to find an O(n∙log(n)) sorting method to sort several arrays simultaneously so that an element in a multi-value array will represent elements from 4 different single value arrays and the sorting method would sort the multi-value elements.
For example:
For a given 4 single value arrays An, Bn, Cn and Dn,
I'd set a new array Qn
so that Qᵢ = [ Aᵢ Bᵢ Cᵢ Dᵢ ].
Qᵢ may be changed during the process so that Qᵢ = [ Aaᵢ Bbᵢ Ccᵢ Ddᵢ ]
where aᵢ, bᵢ, cᵢ and dᵢ are index lists
and of course that Qᵢ ≤ Qᵢ₊₁ = [ Aaᵢ₊₁ Bbᵢ₊₁ Ccᵢ₊₁ Ddᵢ₊₁ ] so that Aaᵢ ≤ Aaᵢ₊₁, Bbᵢ ≤ Bbᵢ₊₁ and so on.
The motivation is to use SIMD intructions of course to benefit from this structure to separately sort the 4 arrays.
I tried to use a SIMD comparer (_mm_cmplt_ps for example) and a masked swap (_mm_blendv_ps for example)
to make a modified version of traditional sorting algorithms (quick sort, heap sort, merge sort etc)
but I always encounter the problem that in theory there appear to be O(n∙log(n)) steps in the decision tree.
And so, a decision, whether if to set a pivot (quick sort) or whether if to exchange a parent with one of its children (heap sort)
is not correct for all of the whole 4 components all together at the same time (and thus, the next step - go right or left - is incorrect).
For now i only have O(n²) methods working.
Any ideas?
It sounds as though a sorting network is the answer to the question that you asked, since the position of the comparators is not data dependent. Batcher's bitonic mergesort is O(n log2 n).

Performance of counting sort

AFAIK counting sort is using following algorithm:
// A: input array
// B: output array
// C: counting array
sort(A,B,n,k)
1. for(i:k) C[i]=0;
2. for(i:n) ++C[A[i]];
3. for(i:k) C[i]+=C[i-1];
4. for(i:n-1..0) { B[C[A[i]]-1]=A[i]; --C[A[i]]; }
What about I remove step 3 and 4, and do following?
3. t=0; for(i:k) while(C[A[i]]) { --A[i]; B[t++]=i; }
Full code here, looks like fine, but I don't know which one has better performance.
Questions:
I guess the complexity of these two versions would be the same, is that ture?
In step 3 and step 4 the first version need to iterate n+k times, the second one only need to iterate n times. So does the second one have better performance?
Your code seems to be correct and it will work in case of sorting numbers. But, suppose you had an array of structures that you were sorting according to their keys. Your method will not work in that case because it simply counts the frequency of a number and while it remains positive assigns it to increasing indices in the output array. The classical method however will work for arrays of structures and objects etc. because it calculates the position that each element should go to and then copies data from the initial array to the output array.
To answer your question:
1> Yes, the runtime complexity of your code will be the same because for an array of size n and range 0...k, your inner and outer loop run proportional to f(0)+f(1)+...+f(k), where f denotes frequency of a number. Therefore runtime is O(n).
2> In terms of asymptotic complexity, both the methods have same performance. Due to an extra loop, the constants may be higher. But, that also makes the classical method a stable sort and have the benefits that I pointed out earlier.

Regarding shell sort algorithm

I am reading abook on algorithms. It is mentioned in shell sort as below
An important property of Shellsort (which we state without proof) is
that an (h subscipt k) hk-sorted file that is then (h subsciprt (k-1))
hk-1-sorted remains hk-sorted. If this were not the case, the
algorithm would likely be of little value, since work done by early
phases would be undone by later phases.
My question is what does author mean by above statement?
Thanks!
Shell sort is a multi-pass sorting algorithm. It works by sorting a subset of the array at a particular integer "stride" value k, i.e. only accessing every kth element in the array.
Initially a large value for the stride is used, on subsequent passes this stride value is decreased until the final pass is run with a stride of 1 (which is typically just a standard insertion sort phase) and the array is fully sorted.
The statement you've asked about merely says that any sorting that was done on earlier passes (larger stride values) is preserved by later passes (smaller stride values). If this wasn't the case there would be no point in the multi-pass approach used by shell sort.
Hope this helps.

Stability of quicksort partitioning approach

Does the following Quicksort partitioning algorithm result in a stable sort (i.e. does it maintain the relative position of elements with equal values):
partition(A,p,r)
{
x=A[r];
i=p-1;
for j=p to r-1
if(A[j]<=x)
i++;
exchange(A[i],A[j])
exchange(A[i+1],A[r]);
return i+1;
}
There is one case in which your partitioning algorithm will make a swap that will change the order of equal values. Here's an image that helps demonstrate how your in-place partitioning algorithm works:
We march through each value with the j index, and if the value we see is less than the partition value, we append it to the light-gray subarray by swapping it with the element that is immediately to the right of the light-gray subarray. The light-gray subarray contains all the elements that are <= the partition value. Now let's look at, say, stage (c) and consider the case in which three 9's are in the beginning of the white zone, followed by a 1. That is, we are about to check whether the 9's are <= the partition value. We look at the first 9 and see that it is not <= 4, so we leave it in place, and march j forward. We look at the next 9 and see that it is not <= 4, so we also leave it in place, and march j forward. We also leave the third 9 in place. Now we look at the 1 and see that it is less than the partition, so we swap it with the first 9. Then to finish the algorithm, we swap the partition value with the value at i+1, which is the second 9. Now we have completed the partition algorithm, and the 9 that was originally third is now first.
Any sort can be converted to a stable sort if you're willing to add a second key. The second key should be something that indicates the original order, such as a sequence number. In your comparison function, if the first keys are equal, use the second key.
A sort is stable when the original order of similar elements doesn't change. Your algorithm isn't stable since it swaps equal elements.
If it didn't, then it still wouldn't be stable:
( 1, 5, 2, 5, 3 )
You have two elements with the sort key "5". If you compare element #2 (5) and #5 (3) for some reason, then the 5 would be swapped with 3, thereby violating the contract of a stable sort. This means that carefully choosing the pivot element doesn't help, you must also make sure that the copying of elements between the partitions never swaps the original order.
Your code looks suspiciously similar to the sample partition function given on wikipedia which isn't stable, so your function probably isn't stable. At the very least you should make sure your pivot point r points to the last position in the array of values equal to A[r].
You can make quicksort stable (I disagree with Matthew Jones there) but not in it's default and quickest (heh) form.
Martin (see the comments) is correct that a quicksort on a linked list where you start with the first element as pivot and append values at the end of the lower and upper sublists as you go through the array. However, quicksort is supposed to work on a simple array rather than a linked list. One of the advantages of quicksort is it's low memory footprint (because everything happens in place). If you're using a linked list you're already incurring a memory overhead for all the pointers to next values etc, and you're swapping those rather than the values.
If you need a stable O(n*log(n)) sort, use mergesort. (The best way to make quicksort stable by the way is to chose a median of random values as the pivot. This is not stable for all elements equivalent, however.)
Quick sort is not stable. Here is the case when its not stable.
5 5 4 8
taking 1st 5 as pivot, we will have following after 1st pass-
4 5 5 8
As you can see order of 5's have been changed. Now if we continue doing sorting it will change the order of 5's in sorted array.
From Wikipedia:
Quicksort is a comparison sort and, in
efficient implementations, is not a
stable sort.
One way to solve this problem is by not taking Last Element of array as Key. Quick sort is randomized algorithm.
Its performance highly depends upon selection of Key. Although algorithm def says we should take last or first element as key, in reality we can select any element as key.
So I tried Median of 3 approach, which says take first ,middle and last element of array. Sorts them and then use middle position as a Key.
So for example my array is {9,6,3,10,15}. So by sorting first, middle and last element it will be {3,6,9,10,15}. Now use 9 as key. So moving key to the end it will be {3,6,15,10,9}.
All we need to take care is what happens if 9 comes more than once. That is key it self comes more than once.
In such cases after selecting key as middle index we need to go through elements between Key to Right end and if any element is found same key i.e. if 9 is found between middle position to the end make that 9 as key.
Now in the region of elements greater than 9 i.e. loop of j if any 9 is found swap it with region of elements less than that is region of i. Your array will be stable sorted.

Resources