why don't we use three pivot quicksort? - sorting

I was reading research paper Multi-Pivot Quicksort: Theory and Experiments of S Kushagra where he claim with result that three pivot quicksort works better than dual pivot sort then why don't we use them in lang libraries of java or c ?

why don't we use them in lang libraries of java or c ?
The 3-pivot algorithm in Multi-Pivot Quicksort: Theory and
Experiments of S Kushagra is 'academic-strength'; i.e. an integer
array with as input a permutation of [1-N].
A robust 3-pivot version for referenced items (records/ objects) in C is
available at:
https://github.com/ddccc/sixsort
A pdf paper about it is at:
https://github.com/ddccc/MultiPivotParallelQuicksort

Related

Unknown syntax in IBM research paper

I'm reading a research paper High Performance Dynamic Lock-Free Hash Tables
and List-Based Sets (Maged M. Michael) and I don't understand this pseudo-code syntax that's being used for examples.
Specifically these parts:
〈pmark,cur,ptag〉: MarkPtrType;
〈cmark,next,ctag〉: MarkPtrType;
nodeˆ.〈Mark,Next〉←〈0,cur〉;
if CAS(prev,〈0,cur,ptag〉,〈0,node,ptag+1〉)
Eg. (page 5, chapter 3)
UPDATE:
The ˆ. seems to be a Pascal notation for dereferencing the pointer and accessing the variable in the record (https://stackoverflow.com/a/1814936/8524584).
The ← arrow seems to be a Haskell like do notation operator for assignment, that assigns the result of an operation to a variable. This is a bit strange, since the paper is nearly a decade older than Haskell. It's probably some notation that Haskell also borrowed from. (https://en.wikibooks.org/wiki/Haskell/do_notation#Translating_the_bind_operator)
The 〈a, b〉 is a mathematical vector notation for an inner product of a vector (https://mathworld.wolfram.com/InnerProduct.html)
The wide angle bracket notation seems to be either an ad-hoc list notation or manipulation of multiple variables on a single line (thanks #graybeard for pointing it out). It might even be some kind of tuple.
This is what 〈pmark,cur,ptag〉: MarkPtrType; would look like in a C like language:
MarkPtrType pmark;
MarkPtrType cur;
MarkPtrType ptag;
// or some list assignment notation
// or a tuple
The ˆ. seems to be a Pascal notation for dereferencing the pointer and accessing the variable in the record (https://stackoverflow.com/a/1814936/8524584).
The ← arrow is an APL assignment notation, also similar to the Haskell's do notation operator for assignment.

Faster way to create a sorted copy of a vector

Given an unsorted vector as source, the goal is to create a sorted copy as fast as possible.
There are two possibilities:
1. create a vector copy then sort it
2. or insert each element one after the other, to build the sorted vector incrementally.
Which is theoretically the faster way?
This question has very little to do with C++ specifically (except using the word "vector", which also exists in may other languages). Since this reads like an exercise, I really recommend writing a program to test it out:
1. write a program that builds a vector of random N integers, v1
2. copy the vector to v2, via std::copy
3. time how long it takes to use insertion-sort (option 2 above), using a loop
4. time how long std::sort(v2, v2.begin(), v2.end()) takes
You can time things using different timers, either the old <ctime> header or the newer <chrono> one. See this answer for alternatives.
You should find that, for small sizes of N, the loop from step 3 is faster or equivalent to step 4 - and from a few hundred onward, std::sort becomes better and better. Welcome to asymptotic complexity and big-o notation!
Answers can be found in the following nice conference: in short, this depends on the size of the vector. Hence Peter's comment is the best one.
CppCon 2018: Frederic Tingaud “A Little Order: Delving into the STL sorting algorithms”

Haskell function nub inefficient

I'm confused by the implementation of the 'nub' (select unique values) function in the Haskell standard library Data.List. The GHC implementation is
nub l = nub' l []
where
nub' [] _ = []
nub' (x:xs) ls
| x `elem` ls = nub' xs ls
| otherwise = x : nub' xs (x:ls)
As far as I can tell, this has a worst-case time complexity of O(n^2), since for a list of unique values it has to compare them all once to see that they are in fact unique.
If one used a hash table, the complexity could be reduced to O(n) for building the table + O(1) for checking each value against previous values in the hash table. Granted, this would not produce an ordered list but that would also be possible in O(n log n) using GHC's own ordered Data.Map, if that is necessary.
Why choose such an inefficient implementation for an important library function? I understand efficiency is not a main concern in Haskell but at least the standard library could make an effort to choose the (asymptotically) best data structure for the job.
Efficiency is quite a concern in Haskell, after all the language performs on par with Java, and beats it in terms of memory consumption, but of course it's not C.
The answer to your question is pretty simple: the Prelude's nub requires only an Eq constraint, while any implementation based on Map or Set would also require either an Ord or Hashable.
You're absolutely correct - nub is an O(n^2) algorithm. However, there are still reasons why you might want to use it instead of using a hashmap:
for small lists it still might be faster
nub only requires the Eq constraint; by comparison Data.Map requires an Ord constraint on keys and Data.HashMap requires a key type with both Hashable and Ord type classes
it's lazy - you don't have to run through the entire input list to start getting results
Edit: Slight correction on the third point -- you don't have to process the entire list to start getting results; you'll still have to examine every element of the input list (so nub won't work on infinite lists), but you'll start returning results as soon as you find a unique element.
https://groups.google.com/forum/m/#!msg/haskell-cafe/4UJBbwVEacg/ieMzlWHUT_IJ
In my experience, "beginner" Haskell (including Prelude and the bad packages) simply ignores performance in many cases, in favor of simplicity.
Haskell performance is a complex problem to solve, so if you aren't experienced enough to search through Platform or Hackage for alternatives to the simple nub (and especially if your input is in a List just because you haven't thought about alternative structures), then Data.List.nub is likely not your only major performance problem and also you are probably writing code for a toy project where performance doesn't really matter.
You just have to have faith that when you get to building a large (in code or data) project, you will be more experienced and know how to set up your programs more efficiently.
In other words, don't worry about it, and assume that anything in Haskell 98 that comes from Prelude or base is likely to not be the most efficient way to solve a problem.

Is there a module that implements an efficient array type in Erlang?

I have been looking for an array type with the following characteristics in Erlang.
append(vector(), term()) O(1)
nth(Idx, vector()) O(1)
set(Idx, vector(), term()) O(1)
insert(Idx, vector(), term()) O(N)
remove(Idx, vector()) O(N)
I normally use a tuple for this purpose, but the performance characteristics are not what I would want for large N. My testing shows the following performance characteristics...
erlang:append_element/2 O(N).
erlang:setelement/3 O(N).
I have started on a module based on the clojure.lang.PersistentVector implementation, but if it's already been done I won't reinvent the wheel.
Edit:
For those interested, I've finished implementing vector.erl ... using the same algorithm as clojure.lang.PersistentVector. It has similar performance characteristics as the array module, but has slightly better constant factors on append.
The the following test appends 10000 items per interval and then does 10000 lookups and 10000 updates at random index. All operations are near O(1). The timings in the inner tuple are in microseconds.
3> seq_perf:test(vector, 100000, 10).
{2685854,
{ok,[{100000,{66966,88437,124376}},
{200000,{66928,76882,125677}},
{300000,{68030,76506,116753}},
{400000,{72429,76852,118263}},
{500000,{66296,84967,119828}},
{600000,{66953,78155,116984}},
{700000,{65996,77815,138046}},
{800000,{67801,78455,118191}},
{900000,{69489,77882,114886}},
{1000000,{67444,80079,118428}}]}}
4> seq_perf:test(array, 100000, 10).
{2948361,
{ok,[{100000,{105482,72841,108828}},
{200000,{123655,78898,124092}},
{300000,{110023,76130,106806}},
{400000,{104126,73830,119640}},
{500000,{104771,72593,110157}},
{600000,{107306,72543,109713}},
{700000,{122066,73340,110662}},
{800000,{105853,72841,110618}},
{900000,{105267,73090,106529}},
{1000000,{103445,73206,109939}}]}}
Those properties are not possible in a purely functional implementation. The array module (http://www.erlang.org/doc/man/array.html) is a quite good compromise, but if you require O(1) lookup and update, you'll have to use an ETS table instead.

Are there any programming languages that starts counting from 1?

High level programming languages are made to be understandable to humans, but 0 is usually not accepted as a natural number in mathematics. I do not understand why all programming languages I have seen always start counting from 0, eg. int[0] = 1st element instead of int[1] = 1st element. I want to know whether there are any programming languages that support this? If not, why?
Yes, lots. Fortran for example.
And then there are languages which allow array elements to start indexing at almost any integer. Fortran for example.
Not so many (considering the total number of programming languages)
ALGOL 68
APL
AWK
CFML
COBOL
Fortran
FoxPro
Informix
Julia
Lua
Mathematica
MATLAB
PL/I
Ring
RPG
Sass
Smalltalk
Wolfram Language
XPath/XQuery
You can do it in Perl
$[ = 1; # set the base array index to 1
Erlang's tuples and lists index starting at 1.
Sources
Wikipedia

Resources