Algorithm for reorganizing fragmented linked list - performance

The answer here has some basic stats that show a packed linked list would perform pretty fast sorting with merge sort:
What's the fastest algorithm for sorting a linked list?
I am wondering if there are any techniques (sort of like background jobs like garbage collection) that can optimally reorganize a linked list to become more compact at runtime.

Are you trying to save space or time? You could always allocate a new "packed" list and copy/sort the fragmented list into the new "packed" list.
I'm wondering about the results in that prior answer, thinking that that the merge sort on packed list was operating on an already sorted list from the prior tests. In my testing, I started off with a large (bigger than cache) packed linked list where the values were psuedo-random integers, resulting in a sorted packed list where the nodes were pseudo-randomly ordered, what I call a "scattered" (non-cache friendly) linked list. I then filled that list with yet another set of pseudo random numbers to test merge sort on that "scattered" linked list.
The fastest linked list sort (assuming near random data) is a bottom up merge sort using a small (25 to 32) array of pointers (or references) to lists. Wiki article for the basic algorithm:
https://en.wikipedia.org/wiki/Merge_sort#Bottom-up_implementation_using_lists
The issue here is that the merge sort is working with a "scattered" linked list, which is not cache friendly.
As mentioned in the prior answer, but conflicting with the results, it should be faster to copy the linked list to an array, then sort the array with either quick sort or merge sort (either of these array sort methods would be cache friendly), then create a new sorted packed list from the now sorted array. This assumes that space isn't an issue with allocating the arrays or the packed list.
Other alternatives would be to create an array of pointers to nodes to sort the data in the list in place (without relinking the nodes), or creating an array of pointers to nodes, sorting the array of pointers, then relinking the list nodes according to the pointers. Again the issue here is that these methods would be working with a "scattered" linked list, which is not cache friendly.

Related

Binary Search-accessing the middle element drawback

I am studying from my course book on Data Structures by Seymour Lipschutz and I have come across a point I don’t fully understand..
Binary Search Algorithm assumes that one has direct access to middle element in the list. This means that the list must be stored in some typeof linear array.
I read this and also recognised that in Python you can have access to the middle element at all times. Then the book goes onto say:
Unfortunately, inserting an element in an array requires elements to be moved down the list, and deleting an element from an array requires element to be moved up the list.
How is this a Drawback ?
Won’t we still be able to access the middle element by dividing the length of array by 2?
In the case where the array will not be modified, the cost of insertion and deletion are not relevant.
However, if an array is to be used to maintain a sorted set of non-fixed items, then insertion and deletion costs are relevant. In this case, binary search can be used to find items (possibly for deletion) and/or find where new items should be inserted. The drawback is that insertion and deletion require movement of other elements.
Python's bisect module provides binary search functionality that can be used for locating insertion points for maintaining sorted order. The drawback mentioned applies.
In some cases, a binary search tree may be a preferable alternative to a sorted array for maintaining a sorted set of non-fixed items.
It seems that author compares array-like structures and linked list
The first (array, Python and Java list, C++ vector) allows fast and simple access to any element by index, but appending, inserting or deletion might cause memory redistribution.
For the second we cannot address i-th element directly, we need to traverse list from the beginning, but when we have element - we can insert or delete quickly.

Is doubly linked list a non linear data structure or linear data structure?

A linear data structure traverses the data elements sequentially, in which only one data element can directly be reached. Ex: Arrays, Linked Lists.
But in doubly linked list we can reach two data elements using previous pointer and next pointer.
So can we say that doubly linked list is a non linear data structure?
Correct me if I am wrong.
Thank you.
Non-linear data structures are those data-structure in which the elements appear in a non-linear fashion,which requires two or more than two-dimensional representation . The elements may OR mayn't(mostly) be stored in contiguous memory locations,rather in any order/non-linearly as if you have skipped the elements in between. Accessing the elements are also done in an out-of-order pattern.
Example :- A Tree, here one may iterate from root to right child,to its right child,... and so on---thereby skipping all the left nodes.
But, in doubly linked list, you have to move sequentially(linearly) only, to move forward(using forward pointer) or backward(using previous pointer).
You can't jump from any element in the list to any distant element without traversing the intermediary elements.
Hence, doubly-linked list is a linear data structure. In a linear data structure, the elements are arranged in a linear fashion(that is,one-dimensional representation).
You are wrong; 2 justifications:
While you can get to 2 elements from any node, one of them was the one you used to get to this node, so you can only get to one new node from each.
It is still linear in that it has to be traversed sequentially, or in a line.
It is still sequential: you need to go over some elements in the list to get to a particular element, compared to an array where you can randomly access each element.
However, you can go linearly forwards or backwards, which may optimize the search.
linked list is basically a linear data Structure because it stores data in a linear fashion. A linear data Structure is what which stores data in a linear format and the traversing is in sequential manner and not in zigzag way.
It depends on where you intend to apply linked lists. If you based it on storage, a linked list is considered non-linear. On the other hand, if you based it on access strategies, then a linked list is considered linear.

Merge sort on a double linked list

Reading up on the good ways to sort a linked list (besides assigning an array and quick-sorting), it looks like mergesort is one of the better methods.
See: Merge Sort a Linked List
The current questions on this topic are non-specific as to weather the list is single or double linked.
My question is:
Is there improved methods of merge-sorting that take advantage of double linked lists?
(or is it just as good to use the same method as a single linked list and assign the previous link just to ensure the list remains valid)
To the best of my knowledge, there are no special time-saving tricks that arise when doing mergesort on a doubly-linked list that don't also work on a singly-linked list.
Mergesort was originally developed for data stored on reels of magnetic tape that could only be read in a forwards direction, and most standard implementations of mergesort only do forwards passes over the elements. The two major operations required by mergesort are splitting the input in half (hard in both singly- and doubly-linked lists) and merging elements back together (basically the same in both cases). As a result, without doing something substantially more clever than a standard mergesort, I would not expect there to be any major speedups.

merge sort algorithm for stream data

I am reading about merge sort at below link
http://www.eternallyconfuzzled.com/tuts/algorithms/jsw_tut_sorting.aspx
Merge sort's claim to fame is that it can easily be modified to handle
sequential data such as from a stream or a generator. Another huge
benefit is that when written carefully, merge sort does not require
that all items be present. It can sort an unknown number of items
coming in from a stream or generator, which is a very useful property.
My questions are
1.My understanding is that merge sort requires complete array because we have to divide array in between and sort independently followed by merge.How merge sort algorithm works if not all items are present?
Give a algorithm in simple terms how merge sort algorithm used for items coming in from a stream?
The answers to 1 and 2 are somewhat related.
You can still perform a mergesort with an incomplete array, which would leave you with a sorted, partially complete array. The running time would still be O(n lg n). As for inserting the remaining items, you could either merge the partial array with the new items, or insert the new items one at a time, q.v. part 2 below. Inserting the remaining items one at a time would work best if the original array is nearly complete.
Assuming you are starting with a sorted array of numbers, as new numbers came in from the stream one-by-one, you would not have to run another mergesort. Instead, you could simply walk through the sorted array in O(n) time and insert each new item coming from the stream.
Merge sort works by dividing the list into subsets, sorting the subsets then putting it all back together. If the subset is one element long (the item being streamed in) then it is already sorted and just needs to be put in with the existing set of elements. Technically, I would call this insertion sort. The hard part here is determining where to put the new element. You could use an array which makes finding the right place quite easy but adding new items requires moving data around to make space (sometimes reallocating the array). Alternatively, you could store the data as a linked list so adding items is trivial but determining where to put new items is trickier. Swings and roundabouts.

implement linked list using array - advantages & disadvantages

I know how to implement linked list using array. For example
we define a struct as follow:
struct Node{
int data;
int link;
}
"data" stores the info and "link" stores the index in the array of next node.
Can anybody tell me what is the advantage and disadvantage of implementing a linked list using array compared to "ordinary" linked list? Any suggestion will be appreciated.
If you back a linked list with an array, you'll end up with the disadvantages of both. Consequently, this is probably not a very good way to implement it.
Some immediate disadvantages:
You'll have dead space in the array (entries which aren't currently used for items) taking up memory
You'll have to keep track of the free entries - after a few insertions and deletions, these free entries could be anywhere.
Using an array will impose an upper limit on the size of the linked list.
I suppose some advantages are:
If you're on a 64 bit system, your "pointers" will take up less space (though the extra space required by free entries probably outweighs this advantage)
You could serialise the array to disk and read it back in with an mmap() call easily. Though, you'd be better off using some sort of protocol buffer for portability.
You could make some guarantees about elements in the array being close to each other in memory.
Can anybody tell me what is the advantage and disadvantage of implementation of linked list using array compared to "ordinary" linked list?
linked lists have the following complexity:
cons x xs : O(1)
append n m : O(n)
index i xs : O(n)
if your representation uses a strict, contiguous array, you will have different complexity:
cons will require copying the old array: O(n)
append will require copying both arrays into a new contiguous space: O(n + m)
index can be implemented as array access: O(1)
That is, a linked list API implemented in terms of arrays will behave like an array.
You can mitigate this somewhat by using a linked list or tree of strict arrays, leading to ropes or finger trees or lazy sequences.
stack in implement two way.
first in using array and second is using linked list.
some disadvatages in using array then most of programmer use linked list in stack implement.
first is stack using linked list first not declare stack size and not limited data store in stack. second is linked list in pointer essay to declare and using it.
only one pointer use in linked list. its called top pointer.
stack is lifo method use. but some disadvantages in linked list program implemention.
Most of programmer use stack implemention using liked list.
Using Array implementation, you can have sequential & faster access to nodes of list, on the other hand,
If you implement Linked list using pointers, you can have random access to nodes.
Array implementation is helpful when you are dealing with fixed no. Of elements because resizing an array is expensive as far as performance is concerned because if you are required to insert/delete nodes from middle of the list it you have to shift every node afterwise.
Contrary to this, You should use pointer implemention when you don't know no. of nodes you would want, as such a list can grow/shrink efficiently & you don't need to shift any nodes, it can be done by simply dereferencing & referencing pointers.

Resources