In which version of JDK7, MergeSort was replaced with TimSort in Collections.sort method? - java-7

I searched and couldn't find in which version the TimSort actually replaced the MergeSort in collections.sort() method. If anyone can let me know the exact version for JDK7 it would be a great help.

TimSort has been used in Java 7 right from the release. To be more specific, it has been added in Java 7, beta 70.
It might be worth noting that this is still a modified merge sort, just having more modifications being more efficient than the modified merge sort that has been used in previous version.

Related

Trying to get fast reverse memcmp in C++

I need to compare two char arrays as fast as possible and return which one is bigger. Normally I would use memcmp but unfortunately, there's only a pointer available for my tool pointing to the data stored in reverse order, or to the LSB in other words.
I've already read this question: How to do reverse memcmp?, but since it was asked in 2011 for c I was hoping that there has been some improvement since that time.
Looking at the first answer here Why is memcmp so much faster than a for loop check?, there should be a direction flag which would solve my problem, but I've no experience in embedding assembler code in c++ at all, so debugging or giving any support to others would be impossible for me.
Is there any builtin version of reverse memcmp which has been released since this question has been asked the first time?
Any suggestions?

How was the cor() function sped up?

Slightly off-topic question, but I was wondering if anybody could tell me when and how the cor() function was improved recently? It is much, much faster than I remember and is now comparable in speed to the rcorr function in HMisc package, which was my alternative correlation function for large matrices.
Thanks for all the suggestions:
After some investigation, the difference in speed is due to using the use="pairwise" flag rather than an algorithmic change. There is ~8 fold difference in speed difference when using this option.
The speed for cor() on R from version 2.4 - 2.13 is comparable.
Thanks,
Iain
http://cran.r-project.org/src/base/NEWS.html has a high level summary of recent changes, and explanations of their relevance. This is sometimes useful to pick up related changes in other functions that might affect what you're doing. A quick find for cor() only shows a couple things, however:
2.13.0
The rank-correlation methods for cor() and cov() with use = "complete.obs" computed the ranks before removing missing values, whereas the documentation implied incomplete cases were removed first. (https://bugs.R-project.org/bugzilla3/show_bug.cgi?id=14488PR#14488)
2.11.0
cor() and cov() now test for misuse with non-numeric arguments, such as the non-bug report https://bugs.R-project.org/bugzilla3/show_bug.cgi?id=14207PR#14207.
Hard to say without knowing what version you're running, but it looks like there are some substantial changes coming in 2.14, and only minor changes between 2.13 and previous versions back to at least 2.10. Compare these to see the current changes coming in 2.14:
2.13 code:
https://svn.r-project.org/R/branches/R-2-13-branch/src/main/cov.c
2.14 code:
https://svn.r-project.org/R/branches/R-2-14-branch/src/main/cov.c

Which sorting algorithm is used by Microsoft's STL::list::sort()?

Note: I accidentally posted this question without specifying which STL implementation I was using, and I felt it can't really be updated since it would render most of its answers obsolete.
So, the correct question goes - which sorting algorithm is used in the below code, assuming I'm using the STL library of Microsoft Visual C++?:
list<int> mylist;
// ..insert a million values
mylist.sort();
Just so you don't have to rely on second hand information, the the sort code is right in the list header - it's about 35 lines.
Appears to be a modified iterative (non-recursive) merge sort with up to 25 bins (I don't know if there's a particular name for this variant of merge sort).
At least in recent versions (e.g. VC++ 9.0/VS 2008) MS VC++ uses a merge-sort.
STL that came with MS VC6 was the P. J. Plauger's version of the library (Dinkumware) and it used merge-sort in std::list<>::sort(). I dont know about later versions of MS's package.
To my knowledge it is Introsoft: http://en.wikipedia.org/wiki/Introsort

Is Quicksort a potential security risk?

I just wondered whether (with some serious paranoia and under certain circumstances) the use of the QuickSort algorithm can be seen as a security risk in an application.
Both its basic implementation and improved versions like 3-median-quicksort have the peculiarity of behaving deviant for certain input data, which means that their runtime can increase extremely in these cases (having O(n^2) complexity) not to mention the possibility of a stackoverflow.
Hence I would see potential to do harm by providing pre-sorted data to a programm that causes the algorithm to behave like this, which could have unpredictable consequences for e.g. a multi-client web application.
Is this strange case worth any security consideration (and would therefore force us to use Intro- or Mergesort instead)?
Edit: I know that there are ways to prevent Quicksort's worst cases, but what about language integrated sorts (like the 3-Median of .NET). Would they be taboo?
Yes, it is a security risk - DoS, to be specific - which is trivially mitigated by adding a check for recursion depth in your quicksort, and switching to something else instead if a certain depth is reached. If you switch to heapsort, then you'll get introsort, which is what many STL implementations actually use.
Alternatively, you just randomize the selection of pivot element.
Many implementations of quicksort are done using a randomized version of the algorithm. This means a DoS attack with specially-crafted input is not possible.
Also, even without this, most data sets are simple too small to have O(nlog) vs O(n^2) matter. The size of the set to sort would have to be quite large to have an impact. Even with a few million elements, the time difference would likely not be very large.
Overall, any given web-application using quicksort is much more likely to have other security flaws.
Take a look at this question (and marked answer) which discusses ways of reducing QuickSort's worst case:
Why is quicksort better than mergesort?
If performance is something that matters, then QuickSort would seem a poor choice in most circumstances, security concern or not. Is there something that causes you to shy away from algorithms like Heapsort or Mergesort?
I think this is very much a question of where you're actually using the quick sort. Using O(n^2) algorithms is perfectly fine when your working with arrays of 5 items, for instance. On the other hand, when there's a chance the data can be significantly large, fearing a DoS is not the first problem you'll face - the first problem will be getting bad performance way before you're facing a real problem. Given the large number of other algorithms available, just have it replaced if it's in a critical location.
It is, but only in very, very unlikely cases -- all of which are easy for a properly-designed algorithm to avoid.
But if you want to be super-safe, you may want to use something like Introsort, which starts out as QuickSort but switches over to Heap Sort if it detects from the recursion depth that the algorithm is starting to go quadratic.
Edit: I see Pavel beat me to Introsort.
In Response to Edited Question: I haven't personally tested every single Quicksort library, but I feel safe betting that pretty much all of them have checks in place to avoid the worst case.

Do you write code to sort a list these days? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
People in java/.net world has framework which provides methods for sorting a list.
In CS, we all might have gone through Bubble/Insertion/Merge/Shell sorting algorithms.
Do you write any of it these days?
With frameworks in place, do you write code for sorting?
Do you think it makes sense to ask people to write code to sort in an interview? (other than for intern/junior developer requirement)
There are two pieces of code I write today in order to sort data
list.Sort();
enumerable.OrderBy(x => x); // Occasionally a different lambda is used
I work for a developer tools company, and as such I sometimes need to write new RTL types and routines. Sorting is something developers need, so it's something I sometimes need to write.
Don't forget, all that library code wasn't handed down from some mountain: some developer somewhere had to write it.
I don't write the sorting algorithm, but I have implemented the IComparer in .Net for a few classes which was kind of interesting the first couple of times.
I wouldn't write the code for sorting given what is in the frameworks in most cases. There should be an understanding of why a particular sorting strategy like Quick sort is often used in frameworks like .Net.
I could see giving or being given a sorting question where some of the work is implementing the IComparer and understanding the different ways to sort a class. It would be a fairly easy thing to show someone a Bubble sort and ask, "Why wouldn't you want to do this in most applications?"
I can say with 100% certainty that I haven't written one of the 'traditional' sort routines since leaving University. It's nice to know the theory behind it, but to apply them to real-world situations that can't be done by other means doesn't happen very often (at least from my experience...).
only on employer's interview/test =)
I wrote a merge sort when I had to sort multi-gigabyte files with a custom key comparison. I love merge sort - it's easy to comprehend, stable, and has a worst-case O(n log n) performance.
I've been looking for an excuse to try radix sort too. It's not as general purpose as most sorting algorithms, so there aren't going to be any libraries that provide it, but under the right circumstances it should be a good speedup.
Personally, I've not had a need to write my own sorting code for a while.
As far as interview questions go, it would weed out those who didn't pay attention during CS classes.
You could test API knowledge by asking how would you build Comparable (Capital C) objects, or something along those lines.
The way I see it, just like many others fields of knowledge, programming also has a theoretical and a practical approach to it.
The field of "theoretical programming" is the one that gave us quicksort, Radix Sort, Djikstra's Algorithm and many other things absolutely necessary to the advance of computing.
The field of "practical programming" deals with the fact that the solutions created in "theoretical programming" should be easily accessible to all in a much easier way, so that the theoretical ideas can get many, many creative uses. This gave us high-level languages like Python and allowed pretty much any language to implement packed methods for the most basics operations like sorting or searching with a good enough performance to be fit for almost everyone.
One can't live without the other...
most of us not needing to hard code a sorting algorithm doesn't mean no one should.
I've reciently had to write a sort, of sorts.
I had a list of text.. the ten most common had to show up according to the frequency at which they were selected. All other entries had to show up according to alpha sort.
It wasn't crazy hard to do but I did have to write a sort to support it.
I've also had to sort objects whose elements aren't easily sorted with an out of the box code.
Same goes for searching.. I had to walk a file and search staticly sized records.. When I found a record I had to move one record back, because I was inserting before it.
For the most part it was very simple and I mearly pasted in a binary search. Some changes needed to be done to support the method of access, because I wasn't using an array that was acutally in memory.. Ah c&#p.. I could have treated it like a stream.. See now I want to go back and take a look..
Man, if someone asked me in an interview what the best sort algorithm was, and didn't understand immediately when I said 'timsort', I'd seriously reconsider if I wanted to work there.
Timsort
This describes an adaptive, stable,
natural mergesort, modestly called
timsort (hey, I earned it ). It
has supernatural performance on many
kinds of partially ordered arrays
(less than lg(N!) comparisons needed,
and as few as N-1), yet as fast as
Python's previous highly tuned
samplesort hybrid on random arrays.
In a nutshell, the main routine
marches over the array once, left to
right, alternately identifying the
next run, then merging it into the
previous runs "intelligently".
Everything else is complication for
speed, and some hard-won measure of
memory efficiency.
http://svn.python.org/projects/python/trunk/Objects/listsort.txt
Is timsort general-purpose or Python-specific?
I haven't really implemented a sort, except as coding exercise and to observe interesting features of a language (like how you can do quicksort on one line in Python).
I think it's a valid question to ask in an interview because it reflects whether the developer thinks about these kind of things... I feel it's important to know what that list.sort() is doing when you call it. It's just part of my theory that you should know the fundamentals behind everything as a programmer.
I never write anything for which there's a library routine. I haven't coded a sort in decades. Nor would I ever. With quicksort and timsort directly available, there's no reason to write a sort.
I note that SQL does sorting for me.
There are lots of things I don't write, sorts being just one of them.
I never write my own I/O drivers. (Although I have in the past.)
I never write my own graphics libraries. (Yes, I did this once, too, in the '80s)
I never write my own file system. (Avoided this.)
There is definitely no reason to code one anymore. I think it is important though to understand the efficiency of what you are using so that you can pick the best one for the data you are sorting.
Yes. Sometimes digging out Shell sort beats the builtin sort routine when your list is only expected to be at most a few tens of records.

Resources