Heapsort and building heaps using linked list - algorithm

I know that linked list is not a appropriate data structure for building heaps.
One of the answers here (https://stackoverflow.com/a/14584517/5841727) says that heap sort can be done in O(nlogn) using linked list which is same as with arrays.
I think that heapify operation would cost O(n) time in linked list and we would need (n/2) heapify operations leading to time complexity of O(n^2).
Can someone please tell how to achieve O(nlogn) complexity (for heap sort ) using linked list ?

Stackoverflow URL you mentioned is merely someone's claim (at least when I'm writing this) so based on assumption here is my answer. Mostly when people mention "Time complexity", they mean asymptomatic analysis and finding out the proportion to which time taken by algorithm would increase with increasing size of input ignoring all the constants.
To prove the time complexity with linkedlist lets assume there is a function which returns value for given index (i know linked list don't return by index). For efficiency of this function you'd also need to pass in level but we can ignore that for now since it doesn't have any impact on time complexity.
So now it comes down to analyzing time proportion impact on this function with increasing input size. Now you could imagine that for fixing (heapifying) one node you may have to traverse 3 times max (1. find out which one to swap with requires one traverse to compare one of two possible children, 2. going back to parent for swaping 3. coming back down to the one you just swapped). Now even though it may seem that you are doing max n/2 traversal for 3 times; for asymptomatic analysis this is just equals to 'n'. Now you'll have to do this for log n exactly same way how you do for an array. Therefore time-complexity is O(n log n). On wikipedia time-complexity table for heaps URL https://en.wikipedia.org/wiki/Binary_heap#Summary_of_running_times

Related

Can my algorithm be done any better?

I have been presented with a challenge to make the most effective algorithm that I can for a task. Right now I came to the complexity of n * logn. And I was wondering if it is even possible to do it better. So basically the task is there are kids having a counting out game. You are given the number n which is the number of kids and m which how many times you skip someone before you execute. You need to return a list which gives the execution order. I tried to do it like this you use skip list.
Current = m
while table.size>0:
executed.add(table[current%table.size])
table.remove(current%table.size)
Current += m
My questions are is this correct? Is it n*logn and can you do it better?
Is this correct?
No.
When you remove an element from the table, the table.size decreases, and current % table.size expression generally ends up pointing at another irrelevant element.
For example, 44 % 11 is 0 but 44 % 10 is 4, an element in a totally different place.
Is it n*logn?
No.
If table is just a random-access array, it can take n operations to remove an element.
For example, if m = 1, the program, after fixing the point above, would always remove the first element of the array.
When an array implementation is naive enough, it takes table.size operations to relocate the array each time, leading to a total to about n^2 / 2 operations in total.
Now, it would be n log n if table was backed up, for example, by a balanced binary search tree with implicit indexes instead of keys, as well as split and merge primitives. That's a treap for example, here is what results from a quick search for an English source.
Such a data structure could be used as an array with O(log n) costs for access, merge and split.
But nothing so far suggests this is the case, and there is no such data structure in most languages' standard libraries.
Can you do it better?
Correction: partially, yes; fully, maybe.
If we solve the problem backwards, we have the following sub-problem.
Let there be a circle of k kids, and the pointer is currently at kid t.
We know that, just a moment ago, there was a circle of k + 1 kids, but we don't know where, at which kid x, the pointer was.
Then we counted to m, removed the kid, and the pointer ended up at t.
Whom did we just remove, and what is x?
Turns out the "what is x" part can be solved in O(1) (drawing can be helpful here), so the finding the last kid standing is doable in O(n).
As pointed out in the comments, the whole thing is called Josephus Problem, and its variants are studied extensively, e.g., in Concrete Mathematics by Knuth et al.
However, in O(1) per step, this only finds the number of the last standing kid.
It does not automatically give the whole order of counting the kids out.
There certainly are ways to make it O(log(n)) per step, O(n log(n)) in total.
But as for O(1), I don't know at the moment.
Complexity of your algorithm depends on the complexity of the operations
executed.add(..) and table.remove(..).
If both of them have complexity of O(1), your algorithm has complexity of O(n) because the loop terminates after n steps.
While executed.add(..) can easily be implemented in O(1), table.remove(..) needs a bit more thinking.
You can make it in O(n):
Store your persons in a LinkedList and connect the last element with the first. Removing an element costs O(1).
Goging to the next person to choose would cost O(m) but that is a constant = O(1).
This way the algorithm has the complexity of O(n*m) = O(n) (for constant m).

Data structure with O(1) insertion and O(log(n)) search complexity?

Is there any data structure available that would provide O(1) -- i.e. constant -- insertion complexity and O(log(n)) search complexity even in the worst case?
A sorted vector can do a O(log(n)) search but insertion would take O(n) (taken the fact that I am not always inserting the elements either at the front or the back). Whereas a list would do O(1) insertion but would fall short of providing O(log(n)) lookup.
I wonder whether such a data structure can even be implemented.
Yes, but you would have to bend the rules a bit in two ways:
1) You could use a structure that has O(1) insertion and O(1) search (such as the CritBit tree, also called bitwise trie) and add artificial cost to turn search into O(log n).
A critbit tree is like a binary radix tree for bits. It stores keys by walking along the bits of a key (say 32bits) and use the bit to decide whether to navigate left ('0') or right ('1') at every node. The maximum complexity for search and insertion is both O(32), which becomes O(1).
2) I'm not sure that this is O(1) in a strict theoretical sense, because O(1) works only if we limit the value range (to, say, 32 bit or 64 bit), but for practical purposes, this seems a reasonable limitation.
Note that the perceived performance will be O(log n) until a significant part of the possible key permutations are inserted. For example, for 16 bit keys you probably have to insert a significant part of 2^16 = 65563 keys.
No (at least in a model where the elements stored in the data structure can be compared for order only; hashing does not help for worst-case time bounds because there can be one big collision).
Let's suppose that every insertion requires at most c comparisons. (Heck, let's make the weaker assumption that n insertions require at most c*n comparisons.) Consider an adversary that inserts n elements and then looks up one. I'll describe an adversarial strategy that, during the insertion phase, forces the data structure to have Omega(n) elements that, given the comparisons made so far, could be ordered any which way. Then the data structure can be forced to search these elements, which amount to an unsorted list. The result is that the lookup has worst-case running time Omega(n).
The adversary's goal is to give away as little information as possible. Elements are sorted into three groups: winners, losers, and unknown. Initially, all elements are in the unknown group. When the algorithm compares two unknown elements, one chosen arbitrarily becomes a winner and the other becomes a loser. The winner is deemed greater than the loser. Similarly, unknown-loser, unknown-winner, and loser-winner comparisons are resolved by designating one of the elements a winner and the other a loser, without changing existing designations. The remaining cases are loser-loser and winner-winner comparisons, which are handled recursively (so the winners' group has a winner-unknown subgroup, a winner-winners subgroup, and a winner-losers subgroup). By an averaging argument, since at least n/2 elements are compared at most 2*c times, there exists a subsub...subgroup of size at least n/2 / 3^(2*c) = Omega(n). It can be verified that none of these elements are ordered by previous comparisons.
I wonder whether such a data structure can even be implemented.
I am afraid the answer is no.
Searching OK, Insertion NOT
When we look at the data structures like Binary search tree, B-tree, Red-black tree and AVL tree, they have average search complexity of O(log N), but at the same time the average insertion complexity is same as O(log N). Reason is obvious, the search will follow (or navigate through) the same pattern in which the insertion happens.
Insertion OK, Searching NOT
Data structures like Singly linked list, Doubly linked list have average insertion complexity of O(1), but again the searching in Singly and Doubly LL is painful O(N), just because they don't have any indexing based element access support.
Answer to your question lies in the Skiplist implementation, which is a linked list, still it needs O(log N) on average for insertion (when lists are expected to do insertion in O(1)).
On closing notes, Hashmap comes very close to meet the speedy search and speedy insertion requirement with the cost of huge space, but if horribly implemented, it can result into a complexity of O(N) for both insertion and searching.

Complexity of maintaining a sorted list vs inserting all values then sorting

Would the time and space complexity to maintain a list of numbers in sorted order (i.e start with the first one insert it, 2nd one comes along you insert it in sorted order and so on ..) be the same as inserting them as they appear and then sorting after all insertions have been made?
How do I make this decision? Can you demonstrate in terms of time and space complexity for 'n' elements?
I was thinking in terms of phonebook, what is the difference of storing it in a set and presenting sorted data to the user each time he inserts a record into the phonebook VS storing the phonebook records in a sorted order in a treeset. What would it be for n elements?
Every time you insert into a sorted list and maintain its sortedness, it is O(logn) comparisons to find where to place it but O(n) movements to place it. Since we insert n elements this is O(n^2). But, I think that if you use a data structure that is designed for inserting sorted data into (such as a binary tree) then do a pass at the end to turn it into a list/array, it is only O(nlogn). On the other hand, using such a more complex data structure will use about O(n) additional space, whereas all other approaches can be done in-place and use no additional space.
Every time you insert into an unsorted list it is O(1). Sorting it all at the end is O(nlogn). This means overall it is O(nlogn).
However, if you are not going to make lists of many elements (1000 or less) it probably doesn't matter what big-O it is, and you should either focus on what runs faster for small data sets, or not worry at all if it is not a performance issue.
It depends on what data structure you are inserting them in. If you are asking about inserting in an array, the answer is no. It takes O(n) space and time to store the n elements, and then O(n log n) to sort them, so O(n log n) total. While inserting into an array may require you to move \Omega(n) elements so takes \Theta(n^2). The same problem will be true with most "sequential" data structures. Sorry.
On the other hand, some priority queues such as lazy leftist heaps, fibonacci heaps, and Brodal queues have O(1) insert. While, a Finger Tree gives O(n log n) insert AND linear access (Finger trees are as good as a linked list for what a linked list is good for and as good as balanced binary search trees for what binary search trees are good for--they are kind of amazing).
There are going to be application-specific trade-offs to algorithm selection. The reasons one might use an insertion sort rather than some kind of offline sorting algorithm are enumerated on the Insertion Sort wikipedia page.
The determining factor here is less likely to be asymptotic complexity and more likely to be what you know about your data (e.g., is it likely to be already sorted?)
I'd go further, but I'm not convinced that this isn't a homework question asked verbatim.
Option 1
Insert at correct position in sorted order.
Time taken to find the position for i+1-th element :O(logi)
Time taken to insert and maintain order for i+1-th element: O(i)
Space Complexity:O(N)
Total time:(1*log 1 +2*log 2 + .. +(N-1)*logN-1) =O(NlogN)
Understand that this is just the time complexity.The running time can be very different from this.
Option 2:
Insert element O(1)
Sort elements O(NlogN)
Depending on the sort you employ the space complexity varies, though you can use something like quicksort, which doesn't need much space anyway.
In conclusion though both time complexity are the same, the bounds are weak and mathematically you can come up with better bounds.Also note that worst case complexity may never be encountered in practical situations, probably you will see only average cases all the time.If performance is such a vital issue in your application, you should test both sets of code on random sampling.Do tell me which one works faster after your tests.My guess is option 1.

Algorithmic complexity: why does ordering reduce complexity to O(log n)

I'm reading some texts about algorithmic complexity (and I'm planning to take an algorithms course later), but I don't understand the following.
Say I've to search for an item in an unordered list, the number of steps it takes to find it would be proportional to the number of items on that list. Finding it in a list of 10 items could take 10 steps, doing the same for a list of 100000 items could take 100000 steps. So the algorithmic complexity would be linear, denoted by 'O(n)'.
Now, this text[1] tells me if I were to sort the list by some property, say a social security number, the algorithmic complexity of finding an item would be reduced to O(log n), which is a lot faster, off course.
Now I can see this happening in case of a b-tree, but how does this apply to a list? Do I misunderstand the text, since English isn't my native language?
[1]http://msdn.microsoft.com/en-us/library/ms379571.aspx
This works for any container that is randomly accessible. In the case of a list you would go to the middle element first. Assuming that's not the target, the ordering tells you if the target will be in the upper sublist or the lower sublist. This essentially turns into a binary search, no different than searching through a b-tree.
Binary search, check middle if target is higher it must reside in the right side, if less its the middle number and so on. Each time you divide the list in two which leaves you with O(log n)

Best continuously sorting algorithm?

I have a set of double-precision data and I need their list to be always sorted. What is the best algorithm to sort the data as it is being added?
As best I mean least Big-O in data count, Small-O in data count (worst case scenario), and least Small-O in the space needed, in that order if possible.
The set size is really variable, from a small number (30) to lots of data (+10M).
Building a self-balancing binary tree like a red-black tree or AVL tree will allow for Θ(lg n) insertion and removal, and Θ(n) retrieval of all elements in sorted order (by doing a depth-first traversal), with Θ(n) memory usage. The implementation is somewhat complex, but they're efficient, and most languages will have library implementations, so they're a good first choice in most cases.
Additionally, retreiving the i-th element can be done by annotating each edge (or, equivalently, node) in the tree with the total number of nodes below it. Then one can find the i-th element in Θ(lg n) time and Θ(1) space with something like:
node *find_index(node *root, int i) {
while (node) {
if (i == root->left_count)
return root;
else if (i < root->left_count)
root = root->left;
else {
i -= root->left_count + 1;
root = root->right;
}
}
return NULL; // i > number of nodes
}
An implementation that supports this can be found in debian's libavl; unfortunately, the maintainer's site seems down, but it can be retrieved from debian's servers.
The structure that is used for indexes of database programs is a B+ Tree. It is a balanced bucketed n-ary tree.
From Wikipedia:
For a b-order B+ tree with h levels of index:
The maximum number of records stored is n = b^h
The minimum number of keys is 2(b/2)^(h−1)
The space required to store the tree is O(n)
Inserting a record requires O(log-b(n)) operations in the worst case
Finding a record requires O(log-b(n)) operations in the worst case
Removing a (previously located) record requires O(log-b(n)) operations in the worst case
Performing a range query with k elements occurring within the range requires O(log-b(n+k)) operations in the worst case.
I use this in my program. You can add your data to the structure as it comes and you can always traverse it in order, front to back or back to front, or search quickly for any value. If you don't find the value, you will have the insertion point where you can add the value.
You can optimize the structure for your program by playing around with b, the size of the buckets.
An interesting presentation about B+ trees: Tree-Structured Indexes
You can get the entire code in C++.
Edit: Now I see your comment that your requirement to know the "i-th sorted element in the set" is an important one. All of a sudden, that makes many data structures less than optimal.
You are probably best off with a SortedList or even better, a SortedDictionary. See the article: Squeezing more performance from SortedList. Both structures have a GetKey function that will return the i-th element.
Likely a heap sort. Heaps are only O(log N) to add new data, and you can pop off the net results at any time in O(N log N) time.
If you always need the whole list sorted every time, then there's not many other options than an insertion sort. It will likely be O(N^2) though with HUGE hassle of linked skip lists you can make it O(N log N).
I would use a heap/priority queue. Worst case is same as average case for runtime. Next element can be found in O(log n) time.
Here is a templatized C# implementation that I derived from this code.
If you just need to know the ith smallest element as it says in the comments, use the BFPRT algorithm which is named after the last names of the authors: Blum, Floyd, Pratt, Rivest, and Tarjan and is generally agreed to be the biggest concentration of big computer science brains in the same paper. O(n) worst-case.
Ok, you want you data sorted, but you need to extract it via an index number.
Start with a basic Tree such as the afforementioned Red-Black trees.
Modify the tree algo such that as you insert elements into the tree all nodes encountered during insertion and deletion keep a count of the number of elements under each branch.
Then when you are extracting data from the tree you can calculate the index as you go, and know which branch to take based on whether is greater or less than the index you are trying to extract.
One other consideration. 10M elements+ in a tree that uses dynamic memory allocation will suck up alot of memory overhead. i.e. The pointers may take up more space than your actual data, plus whatever other member is used to implement the data structure. This will lead to serious memory fragmentation, and in the worst cases, degrade the system's overall performance. (Churning data back and forth from virtual memory.) You might want to consider implementing a combination of block and dynamic memory allocation. Something where in you sort the tree into blocks of data, thus reducing the memory overhead.
Check out the comparison of sorting algorithms in Wikipedia.
Randomized Jumplists are interesting as well.
They require less space as BST and skiplists.
Insertion and deletion is O(log n)
By a "set of double data," do you mean a set of real-valued numbers? One of the more commonly used algorithms for that is a heap sort, I'd check that out. Most of its operations are O( n * log(n) ), which is pretty good but doesn't meet all of your criteria. The advantages of heapsort is that it's reasonably simple to code on your own, and many languages provide libraries to manage a sorted heap.

Resources