d-heaps deletion algorithm - algorithm

Binary heaps are so simple that they are almost always used when
priority queues are needed. A simple generalization is a d-heap, which
is exactly like a binary heap except that all nodes have d children
(thus, a binary heap is a 2-heap).
Notice that a d-heap is much more shallow than a binary heap,
improving the running time of inserts to O(log( base(d)n)). However,
the delete_min operation is more expensive, because even though the
tree is shallower, the minimum of d children must be found, which
takes d - 1 comparisons using a standard algorithm. This raises the
time for this operation to O(d logdn). If d is a constant, both
running times are, of course, O(log n).
My question is for d childeren we should have d comparisions, how author concluded that d-1 comparisions using a standard algorithm.
Thanks!

You have one comparison less than children.
E.g. for two children a1 and a2 you compare only once a1<=>a2 to find the smaller one.
For three children a1, a2, a3 you compare once to find the smaller of a1 and a2 and a second time to compare the smaller one to a3.
By induction you see that for each additional child you need an additional comparison, comparing the minimum of the previous list with the newly added child.
Thus, in general for d children you need d-1 comparisons to find the minimum.

Related

Why is the time complexity O(lgN) for an operation in weighted union find algorithm?

So everywhere I see weighted union find algorithm, they use this approach:
Maintain an array pointing to the parent node of a particular node
Maintain an array denoting the size of the tree a node is in
For union (p,q) merge the smaller tree with larger
Time complexity here is O(lgN)
Now an optimzation on this is to flatten the trees, ie, whenever I am calculating the root of a particular node, set all nodes in that path to point to that root.
Time complexity of this is O(lg*N)
This I can understand, but what I don't get is why don't they start off with an array/hashset wherein nodes point to the root (instead of the immediate parent node)? That would bring the time complexity down to O(1).
I am going to assume that the time complexity you are asking for is the time to check whether 2 nodes belong to the same set.
The key is in how sets are joined, specifically you take the root of one set (the smaller one) and have it point to the root of the other set. Let the two sets have p and q as roots respectively, and |p| will represents the size of the set p if p is a root, while in general it will be the number of items whos set path goes through p (which is 1 + all its children).
We can without loss of generality assume that |p| <= |q| (otherwise we just exchange their names). We then have that |p u q| = |p|+|q| >= 2|p|. This shows us that each subtree in data-structure can at most be half as big as its parent, so given N items it can at most have the depth 1+lg N = O(lg(N)).
If the two choosen items are the furthest possible from the root, it will take O(N) operations to find the root for each of their sets, since you only need O(1) operations to move up one layer in the set, and then O(1) operations to compare those roots.
This cost is also applied to each union operation itself, since you need to figure out which two roots you need to merge. The reason we dont just have all nodes point directly to the root, is several-fold. First we would need to change all the nodes in the set every time we perform a union, secondly we only have edges pointing from the nodes toward the root and not the other way, so we would have to look through all nodes to find the ones we would need to change. The next reason is that we have good optimizations that can help in this kind of step and still work. Finally you could do such a step at the end if you really need to, but it would cost O(N lg(N)) time to perform it, which is compariable to how long it would take to run the entire algorithm by itself without the running short-cut optimization.
You are correct that the solution you suggest will bring down the time complexity of a Find operation to O(1). However, doing so will make a Union operation slower.
Imagine that you use an array/hashtable to remember the representative (or the root, as you call it) of each node. When you perform a union operation between two nodes x and y, you would either need to update all the nodes with the same representative as x to have y's representative, or vice versa. This way, union runs in O(min{|Sx|, |Sy|}), where Sxis the set of nodes with the same representative as x. These sets can be significantly larger than log n.
The weighted union algorithm on the other hand has O(log n) for both Find and
So it's a trade-off. If you expect to do many Find operations, but few Union operations, you should use the solution you suggest. If you expect to do many of each, you can use the weighted union algorithm to avoid excessively slow operations.

Best sorting algorithm - Partially sorted linked list

Problem- Given a sorted doubly link list and two numbers C and K. You need to decrease the info of node with data K by C and insert the new node formed at its correct position such that the list remains sorted.
I would think of insertion sort for such problem, because, insertion sort at any instance looks like, shown bunch of cards,
that are partially sorted. For insertion sort, number of swaps is equivalent to number of inversions. Number of compares is equivalent to number of exchanges + (N-1).
So, in the given problem(above), if node with data K is decreased by C, then the sorted linked list became partially sorted. Insertion sort is the best fit.
Another point is, amidst selection of sorting algorithm, if sorting logic applied for array representation of data holds best fit, then same sorting logic should holds best fit for linked list representation of same data.
For this problem, Is my thought process correct in choosing insertion sort?
Maybe you mean something else, but insertion sort is not the best algorithm, because you actually don't need to sort anything. If there is only one element with value K then it doesn't make a big difference, but otherwise it does.
So I would suggest the following algorithm O(n), ignoring edge cases for simplicity:
Go forward in the list until the value of the current node is > K - C.
Save this node, all the reduced nodes will be inserted before this one.
Continue to go forward while the value of the current node is < K
While the value of the current node is K, remove node, set value to K - C and insert it before the saved node. This could be optimized further, so that you only do one remove and insert operation of the whole sublist of nodes which had value K.
If these decrease operations can be batched up before the sorted list must be available, then you can simply remove all the decremented nodes from the list. Then, sort them, and perform a two-way merge into the list.
If the list must be maintained in order after each node decrement, then there is little choice but to remove the decremented node and re-insert in order.
Doing this with a linear search for a deck of cards is probably acceptable, unless you're running some monstrous Monte Carlo simulation involving cards, that runs for hours or day, so that optimization counts.
Otherwise the way we would deal with the need to maintain order would be to use an ordered sequence data structure: balanced binary tree (red-black, splay) or a skip list. Take the node out of the structure, adjust value, re-insert: O(log N).

Create a binary search tree with a better complexity

You are given a number which is the root of a binary search tree. Then you are given an array of N elements which you have to insert into the binary search tree. The time complexity is N^2 if the array is in the sorted order. I need to get the same tree structure in a much better complexity (say NlogN). I tried it a lot but wasn't able to solve it. Can somebody help?
I assume that all numbers are distinct (if it's not the case, you can use a pair (number, index) instead).
Let's assume that we want to insert we want to insert an element X. If it's the smallest/the largest element so far, its clear where it goes.
Let's a = max y: y in tree and y < X and b = min y: y in tree and y > X. I claim that:
One of them is an ancestor of the other.
Either a doesn't have the right child or b doesn't have the left child.
Proof:
Let it not be the case. Let l = lca(a, b). As a is in its left subtree and b is in it's right subtree, a < l < b. Contradiction.
Let a be an ancestor of b. If b has a left child c. Than a < c < b. Contradiction (the other case is handled similarly).
So the solution goes like this:
Let's a keep a set of elements that are already in a tree (I mean an efficient set with lower_bound operation like std::set in C++ or TreeSet in Java).
Let's find a and b as described above upon every insertion (in O(log N) time using the set's lower_bound operation). Exactly one of them doesn't have an appropriate child. That's where the new element goes.
The total time complexity is clearly O(N log N).
If you look up a word in a dictionary, you open the dictionary about halfway and look at the page. That then tells you if the search word is in the first or second half of the dictionary. Repeat, eliminating half the remaining words on each pass, and you soon narrow it down to a single word. 4 billion word dictionaries will take about 32 passes.
A binary search tree uses the same principle. Except as well as looking up, you can also insert. Insertion is O(log N), unless the tree becomes degenerate.
To prevent the tree going degenerate, you use a system of "red" and "black" nodes (the colours are just conventional), and you don't allow long runs of
either colour. The full explanation is in my book, Basic Algorithms
http://www.lulu.com/spotlight/bgy1mm
An implementation is here
https://github.com/MalcolmMcLean/babyxrc/blob/master/src/rbtree.c
https://github.com/MalcolmMcLean/babyxrc/blob/master/src/rbtree.h
But you will need some explanation if you want to learn about red black
trees from it.

Build data structure to perform reverse and report queries in poly-logn time

Given a sequence of n numbers, {a1, a2, a3, …, an}. Build a data structure such that the following operations can be performed in poly-logn time.
Reverse(i, j):
Reverse all the elements in the range i to j, as shown below:
Original Sequence: <… ai-1, ai, ai+1, …, aj-1, aj, aj+1, …>
Sequence after swap: <… ai-1, aj, aj-1, …, ai-1, ai, aj+1, …>
Report(i):
Report the i-th element in the sequence, i.e. ai.
Here, poly-logn means some power of log n. like log(n) · log(n) may be acceptable.
[Note: Thanks to Prof. Baswana for asking this question.]
I was thinking of using a binary tree, with a node augmented with a Left|Right indicator and the number of elements in this sub-tree.
If the indicator is set to Left then begin by reading the left child, then read the right one
Else (set to Right) then begin by reading the right child, then read the left one
The Report is fairly obvious: O(log n)
The Revert is slightly more complicated, and I am unsure if it'd really work.
The idea would be to "isolate" the sequence of elements to reverse in a particular sub-tree (the lowest possible). This subtree contains range [a..b] including [i..j]
Reverse the minimum sub-tree that contains this sequence (change of the indicator)
Apply the Revert operation to [a..i-1] and [j+1..b]
Not sure it really works though :/
EDIT:
The previous solution does not work :) I can't imagine a solution that does not rearrange the tree, and they do not respect the complexity requirements.
I'll leave this there in case it gives some idea to someone else, and I'll delete it afterward unless I find a solution myself.
Splay trees + your decorations get O(log n) amortized. The structural problems Matthieu encountered are dealt with by the fact that in O(log n) amortized time, we can change the root to any node we like.
(Note: this data structure is an important piece of local search algorithms for the Traveling Salesman Problem, where people have found that two- and three-level trees with high arity are more efficient in practice.)

Pairwise priority queue

I have a set of A's and a set of B's, each with an associated numerical priority, where each A may match some or all B's and vice versa, and my main loop basically consists of:
Take the best A and B in priority order, and do stuff with A and B.
The most obvious way to do this is with a single priority queue of (A,B) pairs, but if there are 100,000 A's and 100,000 B's then the set of O(N^2) pairs won't fit in memory (and disk is too slow).
Another possibility is for each A, loop through every B. However this means that global priority ordering is by A only, and I really need to take priority of both components into account.
(The application is theorem proving, where the above options are called the pair algorithm and the given clause algorithm respectively; the shortcomings of each are known, but I haven't found any reference to a good solution.)
Some kind of two layer priority queue would seem indicated, but it's not clear how to do this without using either O(N^2) memory or O(N^2) time in the worst case.
Is there a known method of doing this?
Clarification: each A must be processed with all corresponding B's, not just one.
Maybe there's something I'm not understanding but,
Why not keep the A's and B's in separate heaps, get_Max on each of the heaps, do your work, remove each max from its associated heap and continue?
You could handle the best pairs first, and if nothing good comes up mop up the rest with the given clause algorithm for completeness' sake. This may lead to some double work, but I'd bet that this is insignificant.
Have you considered ordered paramodulation or superposition?
It appears that the items in A have an individual priority, the items in B have an individual priority, and the (A,B) pairs have a combined priority. Only the combined priority matters, but hopefully we can use the individual properties along the way. However, there is also a matching relation between items in A and items in B that is independent priority.
I assume that, for all a in A, b1 and b2 in B, such that Match(a,b1) and Match(a,b2), then Priority(b1) >= Priority(b2) implies CombinedPriority(a,b1) >= CombinedPriority(a,b2).
Now, begin by sorting B in decreasing order priority. Let B(j) indicate the jth element in this sorted order. Also, let A(i) indicate the ith element of A (which may or may not be in sorted order).
Let nextb(i,j) be a function that finds the smallest j' >= j such that Match(A(i),B(j')). If no such j' exists, the function returns null (or some other suitable error value). Searching for j' may just involve looping upward from j, or we may be able to do something faster if we know more about the structure of the Match relation.
Create a priority queue Q containing (i,nextb(i,0)) for all indices i in A such that nextb(i,0) != null. The pairs (i,j) in Q are ordered by CombinedPriority(A(i),B(j)).
Now just loop until Q is empty. Pull out the highest-priority pair (i,j) and process (A(i),B(j)) appropriately. Then re-insert (i,nextb(i,j+1)) into Q (unless nextb(i,j+1) is null).
Altogether, this takes O(N^2 log N) time in the worst case that all pairs match. In general, it takes O(N^2 + M log N) where M are the number of matches. The N^2 component can be reduced if there is a faster way of calculating nextb(i,j) that just looping upward, but that depends on knowledge of the Match relation.
(In the above analysis, I assumed both A and B were of size N. The formulas could easily be modified if they are different sizes.)
You seemed to want something better than O(N^2) time in the worst case, but if you need to process every match, then you have a lower bound of M, which can be N^2 itself. I don't think you're going to be able to do better than O(N^2 log N) time unless there is some special structure to the combined priority that lets you use a better-than-log-N priority queue.
So you have a Set of A's, and a set of B's, and you need to pick a (A, B) pair from this set such that some f(a, b) is the highest of any other (A, B) pair.
This means you can either store all possible (A, B) pairs and order them, and just pick the highest each time through the loop (O(1) per iteration but O(N*M) memory).
Or you could loop through all possible pairs and keep track of the current maximum and use that (O(N*M) per iteration, but only O(N+M) memory).
If I am understanding you correctly this is what you are asking.
I think it very much depends on f() to determine if there is a better way to do it.
If f(a, b) = a + b, then it is obviously very simple, the highest A, and the highest B are what you want.
I think your original idea will work, you just need to keep your As and Bs in separate collections and just stick references to them in your priority queue. If each reference takes 16 bytes (just to pick a number), then 10,000,000 A/B references will only take ~300M. Assuming your As and Bs themselves aren't too big, it should be workable.

Resources