What is the difference between straight selection sort vs. exchange selection sort? I got into a little debate today - my professor uses these two terminologies in his lecture notes. The selection sort that Wikipedia and any textbook or website will give you is what he is calling "exchange selection sort".
I've never heard the term "exchange selection sort" used before (only "selection sort"), and cannot find any relevant resources on the former terminology online. Also, "exchange sort" redirects to bubble sort on Wikipedia.
I've also never heard the term "straight selection sort" used before, and cannot find any relevant resources online. His notes state that it's a version of selection sort which uses a secondary array rather than an in-place sort, populating it one-by-one from the smallest to largest element. When I brought up the issue he claimed it was older and that just because it doesn't come up on Google doesn't mean it's incorrect. However, I've found far more obscure things on Google, and something like selection sort is going to have a massive amount of resources on the web.
So, do these algorithms go by other names? Does he simply have the names wrong? Who is right?
I hadn't heard those exact terms before but they make sense to me. I don't think the terminology is really that important as long as you understand what they're doing.
If you're creating a sorted copy of a list, you can create each item in the new list one-by-one from the minimum of the old list; ‘straight’ seems as reasonable a description as any for this.
OTOH if you're sorting a list in-place then each time you move a new item to the head of the list, you have to move the item that was previously there backwards to make room. In an array list the cheapest way to do that is just to left the new minimum item and the old item swap places: an exchange. (In a linked list it would be quicker to let the whole tail of the list slide back one place.)
Textbooks tend to concentrate on in-place sorting.
Both the algorithms have the same technique. However, the only difference between the two is that in the selection sort, only the index is updated every time the comparison is made and the swapping happens at the end of an iteration. On the other hand, in exchange sort, the swapping happens as soon as the element on the right side is smaller than the one which is present at the current position.
For reference, you can see these 2-minute videos:
Insertion sort: https://www.youtube.com/watch?v=JU767SDMDvA
Exchange sort: https://www.youtube.com/watch?v=v0ipy1h-TPM
Related
To put things short, I am learning about data structures and started building my own library (for practice). While comparing my implementation to the existing STL one I saw quite some differences. That is, when I insert an element in my singly linked list I push elements to the back of the list, while C++11's forward_list only allows pushing to the front (which actually makes sense, because the insertion has to be of complexity O(1)). My question is, why the books that present the idea of a singly linked list push and pop from back and the STL (which is supposed to be right one) pushes and pops from the front? Is one of the two the correct one, or is it just a matter of preference?
Please don't be harsh on me if the question sounds silly, but I have literally spent days trying to understand this and I didn't manage to find and trustful help online. Thank you very much for taking your time!
This question already has answers here:
What is stability in sorting algorithms and why is it important?
(10 answers)
Closed 8 years ago.
Can someone explain what "stable" and "unstable" mean in relation to various sorting algorithms> How can one determine whether an algorithm is stable or not, and what applications do unstable sorting algorithms usually have (since they are unstable)?
If a sorting algorithm is said to be "unstable", this means that for any items that rank the same, the order of the tied members is not guaranteed to stay the same with successive sorts of that collection. For a 'stable' sort, the tied entries will always end up in the same order when sorted.
For an example of applications, the quick sort algorithm is not stable. This would work fine for something like sorting actions by priority (if two actions are of equal priority, you would not be likely to care about which elements of a tie are executed first).
A stable sorting algorithm, on the other hand, is good for things like a leaderboard for an online game. If you were to use an unstable sort, sorting by points (for instance), then a user viewing the sorted results on a webpage could experience different results on page refreshes and operations like paging through results would not function correctly.
A stable sort retains the order of identical items. Any sort can be made stable by appending the row index to the key. Unstable sorts, like heap sort and quick sort for example do not have this property inherently, but they are used because they tend to be faster and easier to code than stable sorts. As far as I know there are no other reasons to use unstable sorts.
I have come up with a data structure that combines some of the advantages of linked lists with some of the advantages of fixed-size arrays. It seems very obvious to me, and so I'd expect someone to have thought of it and named it already. Does anyone know what this is called:
Take a small fixed-size array. If the number of elements you want to put in your array is greater than the size of the array, add a new array and whatever pointers you like between the old and the new.
Thus you have:
Static array
—————————————————————————
|1|2|3|4|5|6|7|8|9|a|b|c|
—————————————————————————
Linked list
———— ———— ———— ———— ————
|1|*->|2|*->|3|*->|4|*->|5|*->NULL
———— ———— ———— ———— ————
My thing:
———————————— ————————————
|1|2|3|4|5|*->|6|7|8|9|a|*->NULL
———————————— ————————————
Edit: For reference, this algorithm provides pretty poor worst-case addition/deletion performance, and not much better average-case. The big advantage for my scenario is the improved cache performance for read operations.
Edit re bounty: Antal S-Z's answer was so complete and well-researched that I wanted to provide em with a bounty for it. Apparently Stack Overflow doesn't let me accept an answer as soon as I've offered a bounty, so I'll have to wait (admittedly I am abusing the intention bounty system somewhat, although it's in the name of rewarding someone for an excellent answer). Of course, if someone does manage to provide a better answer, more power to them, and they can most certainly have the bounty instead!
Edit re names: I'm not interested in what you'd call it, unless you'd call it that because that's what authorities on the subject would call it. If it's a name you just came up with, I'm not interested. What I want is a name that I can look up in text books and with Google. (Also, here's a tip: Antal's answer is what I was looking for. If your answer isn't "unrolled linked list" without a very good reason, it's just plain wrong.)
It's called an unrolled linked list. There appear to be a couple of advantages, one in speed and one in space. First, if the number of elements in each node is appropriately sized (e.g., at most the size of one cache line), you get noticeably better cache performance from the improved memory locality. Second, since you have O(n/m) links, where n is the number of elements in the unrolled linked list and m is the number of elements you can store in any node, you can also save an appreciable amount of space, which is particularly noticeable if each element is small. When constructing unrolled linked lists, apparently implementations will try to generally leave space in the nodes; when you try to insert in a full node, you move half the elements out. Thus, at most one node will be less than half full. And according to what I can find (I haven't done any analysis myself), if you insert things randomly, nodes tend to actually be about three-quarters full, or even fuller if operations tend to be at the end of the list.
And as everyone else (including Wikipedia) is saying, you might want to check out skip lists. Skip lists are a nifty probabilistic data structure used to store ordered data with O(log n) expected running time for insert, delete, and find. It's implemented by a "tower" of linked lists, each layer having fewer elements the higher up it is. On the bottom, there's an ordinary linked list, having all the elements. At each successive layer, there are fewer elements, by a factor of p (usually 1/2 or 1/4). The way it's built is as follows. Each time an element is added to the list, it's inserted into the appropriate place in the bottom row (this uses the "find" operation, which can also be made fast). Then, with probability p, it's inserted into the appropriate place in the linked list "above" it, creating that list if it needs to; if it was placed in a higher list, then it will again appear above with probability p. To query something in this data structure, you always check the top lane, and see if you can find it. If the element you see is too large, you drop to the next lowest lane and start looking again. It's sort of like a binary search. Wikipedia explains it very well, and with nice diagrams. The memory usage is going to be worse, of course, and you're not going to have the improved cache performance, but it is generally going to be faster.
References
“Unrolled Linked List”, http://en.wikipedia.org/wiki/Unrolled_linked_list
“Unrolled Linked Lists”, Link
“Skip List”, http://en.wikipedia.org/wiki/Skip_list
The skip list lecture(s) from my algorithms class.
CDR coding (if you're old enough to remember Lisp Machines).
Also see ropes which is a generalization of this list/array idea for strings.
I would call this a bucket list.
While I don't know your task, I would strongly suggest you have a look at skip lists.
As for name, I'm thinking a bucket list would probably be most apropos
You can call it LinkedArrays.
Also, I would like to see the pseudo-code for the removeIndex operation.
What are the advantages of this data structure in terms of insertion and deletion?
Ex:
What if you want to add an element between 3 and 4? still have to do a shift, it takes O(N)
How do you find out the correct bucket for elementAt?
I agree with jer, you must take a look on skip list. It brings the advantages of Linked List and Arrays. The most of operations are done in O(log N)
As a learning excercise, I've just had an attempt at implementing my own 'merge sort' algorithm. I did this on an std::list, which apparently already had the functions sort() and merge() built in. However, I'm planning on moving this over to a linked list of my own making, so the implementation is not particuarly important.
The problem lies with the fact that a std::list doesnt have facilities for accessing random nodes, only accessing the front/back and stepping through. I was originally planning on somehow performing a simple binary search through this list, and finding my answer in a few steps.
The fact that there are already built in functions in an std::list for performing these kinds of ordering leads me to believe that there is an equally easy way to access the list in the way I want.
Anyway, thanks for your help in advance!
The way a linked list works is that you step through the items in the list one at a time. By definition there is no way to access a "random" element in the list. The Sort method you refer to actually creates a brand new list by going through each node one at a time and placing items at the correct location.
You'll need to store the data differently if you want to access it randomly. Perhaps an array of the elements you're storing.
Further information on linked lists: http://en.wikipedia.org/wiki/Linked_list
A merge sort doesn't require access to random elements, only to elements from one end of the list.
Phil Bagwell, in his 2002 paper on the VList data structure, indicates that you can use a VList to implement a persistent hash table. However, his explanation of how that worked didn't include much detail, and I don't understand it. Can anybody give me a more detailed explanation, or even examples?
Further, it appears to me from what I can see that this data structure, while it may have the same big-O complexity as a Hashtable, will be slower because it does additional lookups. Does anybody care to do a detailed analysis of just how much slower, preferably including cache behaviour? How does the performance relationship between the two change in the case of having no collisions or many?
I had a look at this paper, and it appears very preliminary. The fact that no later version has been published, and that the original appeared in IFL (which is a work-in-progress sort of meeting), suggests that you may be wasting your time.
Hrmm there seem to be a number of issues with the data structures proposed by the paper in question.
Off the cuff, the naive vlists mentioned first seem to need unique references in order to get anything near the time guarantees proposed. You lose the ability for the most part to share tails. You can share the tiny nodes towards the back of the list, but you wind up having to duplicate the largest vlist node the moment you cons something onto the cdr of vlist that is still active. That cost is proportional to the cost of copying the whole list.
With the 2d modifications mentioned later it becomes constant again, but its a pretty large constant, since you wind up at least copying the head of a list of pages (or worse, a vlist) and the first page in your list.
The functional hash list stuff in there didn't seem to make much sense to me to be honest. It was just a brief blurb that seemed to be bolted onto an otherwise unrelated paper, without enough detail to really make out how practical it is.