Sloppy Heap Sort - algorithm

Has anyone ever heard of this heap repair technique: SloppyHeapSort? It uses a "Sloppy" sift-down approach. Basically, it takes the element to be repaired, moves it to the bottom of the heap (without comparing it to its children) by replacing it with its larger child until it hits the bottom. Then, sift-up is called until it reaches its correct location. This makes just over lg n comparisons (in a heap of size n).
However, this cannot be used for heap construction, only for heap repair. Why is this? I don't understand why it wouldn't work if you were trying to build a heap.

The algorithm, if deployed properly, could certainly be used as part of the heap construction algorithm. It is slightly complicated by the fact that during heap construction, the root of the subheap being repaired is not the beginning of the array, which affects the implementation of sift-up (it needs to stop when the current element of the array is reached, rather than continuing to the bottom of the heap).
It should be noted that the algorithm has the same asymptotic performance as the standard heap-repair algorithm; however, it probably involves fewer comparisons. In part, this is because the standard heap-repair algorithm is called after swapping the root of the heap (the largest element) for the last element in the heap array.
The last element is not necessarily the smallest element in the heap, but it is certainly likely to be close to the bottom. After the swap, the standard algorithm will move the swapped element down up to log2N times, with each step requiring two comparisons; because the element is likely to belong near the bottom of the heap, most of the time the maximum number of comparisons will be performed. But occasionally, only two or four comparisons might be performed.
The "sloppy" algorithm instead starts by moving the "hole" from the top of the heap to somewhere near the bottom (log2N comparisons) and then moving the last element up until it finds it home, which will usually take only a few comparisons (but could, in the worst case, take nearly log2N comparisons).
Now, in the case of heapify, heap repair is performed not with the last element in the subheap, but rather with a previously unseen element taken from the original vector. This actually doesn't change the average performance analysis much, because if you start heap repair with a random element, instead of an element likely to be small, the expected number of sift-down operations is still close to the maximum. (Half of the heap is in the last level, so the probability of needing the maximum number of sift-downs for a random element is one-half.)
While the sloppy algorithm (probably) improves the number of element comparisons, it increases the number of element moves. The classic algorithm performs at most log2N swaps, while the sloppy algorithm performs at least log2N swaps, plus the additional ones during sift-up. (In both cases, the swaps can be improved to moves by not inserting the new element until its actual position is known, halving the number of memory stores.)
As a postscript, I wasn't able to find any reference to your "sloppy" algorithm. On the whole, when asking about a proposed algorithm it is generally better to include a link.

There is a linear time algorithm to construct a heap. I believe what the author meant is that using this approach to build a heap is no efficient and better algorithms exist. Of course you can build heap by adding the elements one by one using the described strategy - you simply can do better.

Related

Delete and Increase key for Binomial heap

I currently studying the binomial heap right now.
I learned that following operations for the binomial heaps can be completed in Theta(log n) time.:
Get-max
Insert
Extract Max
Merge
Increase-Key
Delete
But, the two operations Increase key and Delete operations said they need the pointer to the element that need to be complete in Theta(log n).
Here is 3 questions I want to ask:
Is this because if Increase key and Delete don't have the pointer to element, they have to search the elements before the operations took place?
what is the time complexity for the searching operations for the binomial heap? (I believe O(n))
If the pointer to the element is not given for Increase key and Delete operations, those two operations will take O(n) time or it can be lower than that.
It’s good that you’re thinking about this!
Yes, that’s exactly right. The nodes in a binomial heap are organized in a way that makes it very quick to find the minimum value, but the relative ordering of the remaining elements is not guaranteed to be in an order that makes it easy to find things.
There isn’t a general way to search a binomial heap for an element faster than O(n). Or, stated differently, the worst-case cost of any way of searching a binomial heap is Ω(n). Here’s one way to see this. Form a binomial heap where n-1 items have priority 137 and one item has priority 42. The item with priority 42 must be a leaf node. There are (roughly) n/2 leaves in the heap, and since there is no ordering on them to find that one item you’d have to potentially look at all the leaves. To formalize this, you could form multiple different binomial heaps with these items, and whatever algorithm was looking for the item of priority 42 would necessarily have to find it in the last place it looks at least once.
For the reasons given above, no, there’s no way to implement those operations quickly without having pointers to them, since in the worst case you have to search everywhere.

What is O(1) space complexity?

I am having a hard time understanding what is O(1) space complexity. I understand that it means that the space required by the algorithm does not grow with the input or the size of the data on which we are using the algorithm. But what does it exactly mean?
If we use an algorithm on a linked list say 1->2->3->4, to traverse the list to reach "3" we declare a temporary pointer. And traverse the list until we reach 3. Does this mean we still have O(1) extra space? Or does it mean something completely different. I am sorry if this does not make sense at all. I am a bit confused.
To answer your question, if you have a traversal algorithm for traversing the list which allocate a single pointer to do so, the traversal algorithms is considered to be of O(1) space complexity. Additionally, let's say that traversal algorithm needs not 1 but 1000 pointers, the space complexity is still considered to be O(1).
However, if let's say for some reason the algorithm needs to allocate 'N' pointers when traversing a list of size N, i.e., it needs to allocate 3 pointers for traversing a list of 3 elements, 10 pointers for a list of 10 elements, 1000 pointers for a list of 1000 elements and so on, then the algorithm is considered to have a space complexity of O(N). This is true even when 'N' is very small, eg., N=1.
To summarise the two examples above, O(1) denotes constant space use: the algorithm allocates the same number of pointers irrespective to the list size. In contrast, O(N) denotes linear space use: the algorithm space use grows together with respect to the input size.
It is just the amount of memory used by a program. the amount of computer memory that is the main memory required by the algorithm to complete its execution with respect to the input size.
Space Complexity(s(P)) of an algorithm is the total space taken by the algorithm to complete its execution with respect to the input size. It includes both Constant space and Auxiliary space.
S(P) = Constant space + Auxiliary space
Constant space is the one that is fixed for that algorithm, generally equals to space used by input and local variables. Auxiliary Space is the extra/temporary space used by an algorithm.
Let's say I create some data structure with a fixed size, and no matter what I do to the data structure, it will always have the same fixed size. Operations performed on this data structure are therefore O(1).
An example, let's say I have an array of fixed size 100. Any operation I do, whether that is reading from the array or updating an element, that operation will be O(1) on the array. The array's size (and thus the amount of memory it's using) is not changing.
Another example, let's say I have a LinkedList to which I add elements to it. Every time I add an element to the LinkedList, that is a O(N) operation to the list because I am growing the amount of memory required to hold all of it's elements together.
Hope this helps!

Is a dynamically sized heap insertion technically O(n)?

Inserting an element into a heap involves appending it to the end of the array and then propagating it upwards until it's in the "right spot" and satisfies the heap property, the operation of which is O(logn).
However, in C, for instance, calling realloc in order to resize the array for the new element can (and likely will) result in having to copy the entirety of the array to another location in memory, which is O(n) in the best and worst case, right?
Are heaps in C (or any language, for that matter) usually done with a fixed, pre-allocated size, or is the copy operation inconsequential enough to make a dynamically sized heap a viable choice (e.g, a binary heap to keep a quickly searchable list of items)?
A typical scheme is to double the size when you run out of room. This doubling--and the copying that goes with it--does indeed take O(n) time.
However, notice that you don't have to perform this doubling very often. If you average out the total cost of all the doubling over all the operations performed on the heap that did not involve doubling, then the cost is indeed inconsequential. (This kind of averaging is known as amortized analysis.)

Complexity of search operation on a nedtrie (bitwise trie)

I recently heard about nedtries and decided to try implementing them, but something bothers me about the complexity of their search operation; I can't stand why they are supposed to be so fast.
From what I understood, the expected complexity of their search operation should be
O(m/2) with m the size of the key in bits.
If you compare it to the complexity of the search operation in a traditional binary tree,
you get:
log2(n) >= m/2
Let's the key be 32bits long: log2(n) >= 16 <=> n >= 65536
So nedtries should be faster than binary trees starting from 65536 items.
However, the author claim they are always faster than binary tree, so either my assumption
on their complexity is wrong or the computations performed at each step of the search are vastly faster in a nedtrie.
So, what about it?
(Note I'm the author of nedtries). I think my explanation of complexity on the front of the nedtries page makes sense? Perhaps not.
The key you're missing is that it's the difference between bits which determines complexity. The more the difference, the lower the search cost, whereas the lower the difference, the higher the search cost.
The fact this works stems from modern out-of-order processors. As a gross simplification, if you avoid main memory your code runs about 40-80x faster than if you are dependent on main memory. That means you can execute 50-150 ops in the time it takes to load a single thing from memory. That means you can do a bit scan and figure out what node we ought to go look at next in not much longer than the time it takes to load the cache line for that node into memory.
This effectively removes the logic, the bit scanning and everything else from the complexity analysis. They could all be O(N^N) and it wouldn't matter. What matters now is that the selection of the next node to look at is effectively free, so it's the number of nodes which must be loaded for examination is the scaling constraint and therefore it's the average number of nodes looked at out of the total number of nodes which is its average complexity, because main memory's slowness is by far the biggest complexity constraint.
Does this make sense? It means weirdnesses like if some bits are densely packed at one end of the key but loosely packed at the other end of the key, searches in the densely packed end will be very considerably slower (approaching O(log N) where N is the number of dense elements) than searches in the loosely packed end (approaching O(1)).
Someday soon I'll get round to adding new functions which take advantage of this feature of bitwise tries, so you can say "add this node to a loosely/densely packed space and return the key you chose" and all sorts of variations on that theme. Sadly, as always, it comes down to time and demands on one's time.
Niall
If you have smaller trees, you can use smaller keys!

Algorithm for merging two max heaps?

Is there an efficient algorithm for merging 2 max-heaps that are stored as arrays?
It depends on what the type of the heap is.
If it's a standard heap where every node has up to two children and which gets filled up that the leaves are on a maximum of two different rows, you cannot get better than O(n) for merge.
Just put the two arrays together and create a new heap out of them which takes O(n).
For better merging performance, you could use another heap variant like a Fibonacci-Heap which can merge in O(1) amortized.
Update:
Note that it is worse to insert all elements of the first heap one by one to the second heap or vice versa since an insertion takes O(log(n)).
As your comment states, you don't seem to know how the heap is optimally built in the beginning (again for a standard binary heap)
Create an array and put in the elements of both heaps in some arbitrary order
now start at the lowest level. The lowest level contains trivial max-heaps of size 1 so this level is done
move a level up. When the heap condition of one of the "sub-heap"s gets violated, swap the root of the "sub-heap" with it's bigger child. Afterwards, level 2 is done
move to level 3. When the heap condition gets violated, process as before. Swap it down with it's bigger child and process recursively until everything matches up to level 3
...
when you reach the top, you created a new heap in O(n).
I omit a proof here but you can explain this since you have done most of the heap on the bottom levels where you didn't have to swap much content to re-establish the heap condition. You have operated on much smaller "sub heaps" which is much better than what you would do if you would insert every element into one of the heaps => then, you willoperate every time on the whole heap which takes O(n) every time.
Update 2: A binomial heap allows merging in O(log(n)) and would conform to your O(log(n)^2) requirement.
Two binary heaps of sizes n and k can be merged in O(log n * log k) comparisons. See
Jörg-R. Sack and Thomas Strothotte, An algorithm for merging heaps, Acta Informatica 22 (1985), 172-186.
I think what you're looking for in this case is a Binomial Heap.
A binomial heap is a collection of binomial trees, a member of the merge-able heap family. The worst-case running time for a union (merge) on 2+ binomial heaps with n total items in the heaps is O(lg n).
See http://en.wikipedia.org/wiki/Binomial_heap for more information.

Resources