Violation in deterministic skip-list topdown insertion - data-structures

Suppose I'm given a skip-list, with an order of 3.
HEAD
level 3 |--------------------------------------------> X
| |---|
level 2 | -------------------> | | ----------------> X
| |---| |---| |---| |---|
level 1 | -> | | -> | | -> | | -> | | -------> X
| |---| |---| |---| |---|
| | 20| |100| |150| |200|
| |---| |---| |---| |---|
minlimit = ceil(order/2) - 1 = 1
maxlimit = order - 1 = 2
So essentially it's a 1-2 skip-list.
If I want to insert 50 by the top-down insertion algorithm, it'll raise the level of node 100 before dropping into gap between Head and 150 and insert 50 right before 100. Now a violation will occur as there are no nodes between 100 and 150 while there should be at least one node of height h-1 in that gap as the minlimit=1.
What am I doing wrong?

If I want to insert 50 by the top-down insertion algorithm, it'll raise the level of node 100 before dropping into gap between Head and 150 and insert 50 right before 100
Why are you doing this?
The first reference I found for deterministic 1-2 skip lists (this paper), available (PDF) as per your link says:
As noted in [...], insertions in ... can
be performed top-down, ... Adopting this
approach, we insert an element in a 1-2-3 skip list by
splitting any gap of size 3 into two gaps of size 1, when
searching for the element to be inserted. We ensure in
this way that the structure retains the gap invariant with or without the inserted element.
To be more
precise, we start our search at the header, and at level
1 higher than the height of the skip list. When we find
the gap that we are going to drop, we look at the level
below and if we see 3 nodes of the same height in a row,
we raise the middle one; after that we drop down a level.
When we reach the bottom level, we simply insert a new
node of height 1.
According to this, you should start at level 3, and look at level 2 below. There are not 3 nodes of the same height in a row here - only the single node 150 - and so you don't need to raise anything. Now, drop down to level 2 in the gap [HEAD,150].
Does that start to address your confusion?

If I want to insert 50 by the top-down insertion algorithm, it'll raise the level of node 100 before dropping into gap between Head and 150 and insert 50 right before 100.
It would not raise the level of node 100. Rather it would raise the level of node 20. According to the algorithm, whenever you have the reached the maxlimit of nodes in a gap, you raise the level of the ceil((maxlimit/2))th node in that gap.
In this instance, when level of node 20 is raised to level 2, there is no level 1 node between head and node 20 but it does not cause any structural violation. The original structure of deterministic skip lists as described in the paper by Munro et al. reads thus.
Assuming that in a skip list of n elements there exists a 0th and a (n+1)st node of height 1 higher than the height of the skip list, we require that between any two nodes of height h (h > 1) or higher, there exist either 1 or 2 nodes of height h – 1.

Related

binary tree compaction of same subtree

Given a tree, find the common subtrees and replace the common subtrees and compact the tree.
e.g.
1
/ \
2 3
/ | /\
4 5 4 5
should be converted to
1
/ \
2 3
/ | /\
4 5 | |
^ ^ | |
|__|___| |
|_____|
this was asked in my interview. The approach i shared was not optimal O(n^2), i would be grateful if someone could help in solutioning or redirect me to a similar problem. I couldn't find any. Thenks!
edit- more complex eg:
1
/ \
2 3
/ | /\
4 5 2 7
/\
4 5
whole subtree rooted at 2 should be replaced.
1
/ \
2 <--3
/ | \
4 5 7
You can do this in a single DFS traversal using a hash map from (value, left_pointer, right_pointer) -> node to collapse repeated occurrences of the subtree.
As you leave each node in your DFS, you just look it up in the map. If a matching node already exists, then replace it with the pre-existing one. Otherwise, add it to the map.
This takes O(n) time, because you are comparing the actual pointers to the left + right subtrees, instead of traversing the trees to compare them. The pointer comparison gives the same result, because the left and right subtrees have already been canonicalized.
Firstly, we need to store the node values that appear in a hash table. If the tree already exists, we can iterate the tree and if a node value is already in the set of nodes and delete the branches of that node. Otherwise, store the values in a hash map and each time, when a new node is made, check if the value appears in the map.

convert a tree into a heap using minimum number of changes

Given a k-ary tree, i want to convert it into a min-heap with minimum number of changes. Change is defined as relabelling a node.
one solution i have found is that, i can try a dp solution of changing a nodes value or not changing. But its going to be exponential in time complexity ?
Any ideas, (preferable with optimality proofs).
Example : Say the tree is, 1-3, 3-2, 1-4, 4-5. where 1 is root. Then i can relabel node 3 to 1 or 2, that is in 1 change it becomes a min-heap.
If all you want to do is make sure that the tree satisfies the heap property (the key stored in each node is less than or equal to the keys stored in the node's children), then you should be able to use something like the build-heap algorithm, which operates in O(n).
Consider this tree:
8
-------------
| | |
15 6 19
/ \ | / | \
7 3 5 12 9 22
Now, working from the bottom up, you push each node down the tree as far as it can go. That is, if the node is larger than any of its children, you replace it with the smallest of its children, and you do so until you reach the leaf level, if necessary.
For example, you look at the node valued 15. It's larger than its smallest child, so you swap it, making the subtree:
3
/ \
7 15
Also, 6 swaps places with 5, and 19 swaps places with 9, giving you this tree:
8
-------------
| | |
3 5 9
/ \ | / | \
7 15 6 12 19 22
Note that at the next to leaf level, each node is smaller than its smallest child.
Now, the root. Since the rule is to swap the node with its smallest child, you swap 8 with 3, giving:
3
-------------
| | |
8 5 9
/ \ | / | \
7 15 6 12 19 22
But you're not done because 8 is greater than 7. You swap 8 with 7, and you get this tree, which meets your conditions:
3
-------------
| | |
7 5 9
/ \ | / | \
8 15 6 12 19 22
If the tree is balanced, the entire procedure has complexity O(n). If the tree is severely unbalanced, the complexity is O(n^2). There is a way to guarantee O(n), regardless of the tree's initial order, but it requires changing the shape of the tree.
I won't claim that the algorithm guarantees the "minimal number of changes" for any given tree. I can prove, however, that with a balanced tree the algorithm is O(n). See https://stackoverflow.com/a/9755805/56778, which explains it for binary heap. The explanation also applies to d-ary heap.

Given a red-black tree on n nodes, what is the maximum number of red nodes on any root to leaf path?

This was a quiz question. I'm not sure whether my answer was right. Please help me out.
Lets say the height is h, since no two consecutive nodes (as we go up the tree) can be red, wouldn't the max number of red nodes be h/2? (h = log n)
Somehow, I feel that is not the correct answer.
Any help/input would be greatly appreciated!
Thank you so much in advance!
** Edit ** This answer assumes a definition of height to be the number of nodes in the longest path from root to leaf (used, e.g., in lecture notes here) including the "virtual" black leaf nodes. More common definition counts the number of edges, and does not include the leaf nodes. With this definition the answer is round(h/2), and if you include the leaf nodes in to height round_down(h/2). ** Edit ends **
If you follow the rules that the root node is black as in Wikipedia, then the correct answer is the largest integer smaller than h/2. This is just because root and leaves are black, and half of the nodes (rounded up) in between can be red. I.e. round((h-2)/2)
You can also find the rule just by considering some small red-black trees of different heights.
Case h=1 root is black -> 0 red nodes
Case 'h=2' root is black and leaves are black -> 0 red nodes
Case h=3 root is black, second level can be red, and leaves must be black -> max 1 red node
Case h=4 root is black, second level can be red, third level must be black, and leaves must be black -> max 1 red node
Case h=5 black, red, black, red, black -> max 2 red nodes.
The h as a function of n is trickier, but it can be shown that h <= 2 log (n+1), which guarantees the logarithmic search time. For a proof see, e.g., Searching and Search Trees II (page 11). The proof is based on the fact that the rules of red-black tree guarantee that a subtree starting at x contains at least 2^(bh(x)) - 1 internal nodes, where bh(x) is the black height - number of black nodes in path from root to leaf. This is proven by induction. Then by noting that at most half of the nodes are black (we are speaking of subtrees so the root can be red) that bh(x) >= h/2. Now using these results we get n >= 2^bh(x) - 1 >= 2^(h/2) -1. Solving for h, we get the answer h <= 2 log(n+1).
As the question was a quiz, it should be enough to say that h is proportional to log(n) or even about log(n).
Let's first see how few nodes (minimising n) are needed to make a path with 1 red node (* is black):
*
/ \
* R
/ \
* *
So n must be at least 5 when 1 red node is needed. It has 3 leaf nodes, and 2 internal nodes. Removing any node will require to drop the red node as well to stay within the rules.
If we want to extend this tree to get a path with 2 red nodes we could apply the following two steps:
All leaves get two black children
The right-most leaf (just added) is turned into a red node, and it gets 2 black children.
The dollar signs are the added black nodes compared to the prevous tree:
*
/ \
* R
/| / \
$ $ * *
/| / \
$ $ $ R
/ \
$ $
We choose to place that path with the red nodes on the right side; this choice does not influence the conclusions. Note that it does not help to add red nodes in other, shorter paths, as this will only increase the number of nodes without increasing the path with the most red nodes.
The number of leaf nodes (L) doubles with step 1, while the nodes that were leaves become internal nodes (I).
The second step increases both the number of internal nodes and number of leaves with 1. More formally put, we can find these formulas, where the index r represents the number of red nodes:
L1 = 3
I1 = 2
Lr+1 = 2Lr + 1
Ir+1 = Ir + Lr + 1
Put in a table for increasing r:
r | L | I | n=L+I
----+-----+-----+-------
1 | 3 | 2 | 5
2 | 7 | 6 | 13
3 | 15 | 14 | 29
4 | 31 | 30 | 61
... | ... | ... | ...
We can see the following is true:
Lr = 2r+1 - 1
Ir = 2r+1 - 2
And so:
nr = 2r+2 - 3
So we have a formula for knowing the minimum number of nodes needed to have a path with r red nodes. We need a different relation: the maximum for r when given n.
From the above we can derive:
r = ⌊ log2(n+3) ⌋ - 2

Why does pairing heap need that special two passes when delete_min?

I am reading the Pairing heap.
It is quite simple, the only tricky part is the delete_min operation.
The only non-trivial fundamental operation is the deletion of the
minimum element from the heap. The standard strategy first merges the
subheaps in pairs (this is the step that gave this datastructure its
name) from left to right and then merges the resulting list of heaps
from right to left:
I don't think I need copy/paste the code here, as it is in the wiki link.
My questions are
why they do this two pass merging?
Why they first merge pairs? not directly merge them all?
also why after merging pairs, merge specifically from right to left?
With pairing heap, adding an item to the heap is an O(1) operation because all it does is add the node either as the new root (if it's smaller than the current root), or as the first child of the current root. So if you created a pairing heap and added the numbers 0 through 9 to it, in order, you would end up with:
0
|
-----------------
| | | | | | | | |
9 8 7 6 5 4 3 2 1
If you then do a delete-min, you then have to look at each child to determine the minimum item and build the new heap. If you use the naive left to right combining method, you end up with this tree:
1
|
---------------
| | | | | | | |
9 8 7 6 5 4 3 2
And the next time you do a delete-min you have to look at the 8 remaining children, etc. Using this technique, creating and then removing all items from the heap would be an O(n^2) operation.
The two-pass method of combining in pairs and then combining the pairs results in a much more efficient structure. Consider the first case. After deleting the minimum item, we're left with the nine children. They're combined in pairs from left to right to produce:
8 6 4 2 1
/ / / /
9 7 5 3
Then we combine the the pairs right to left. In steps:
8 6 4 1
/ / / /
9 7 5 2
/
3
8 6 1
/ / / \
9 7 2 4
/ /
3 5
8 1
/ |
9 ---------
6 4 2
/ / /
7 5 3
1
|
----------
8 6 4 2
/ / / /
9 7 5 3
Now, the next time we call delete-min, there are only four nodes to check, and the next time after that there will only be two. Using the two-pass combining method reduces the number of nodes at the child level by at least half. The arrangement I showed is the worst case. If the items were in ascending order, the first delete-min operation would result in a tree with only two child nodes below the root.
This is a particularly good example of the amortized complexity of pairing heap. insert is O(1), but the first delete-min after a bunch of insert operations is O(n), where n is the number of items that were inserted since the last delete-min. The beauty of the two-pass combining rule is that it quickly reorganizes the heap to reduce that O(n) complexity.
With this combining rule, the amortized complexity of delete-min is O(log n). With the strict left-to-right rule, it's O(n).

An interview question from Google [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Given a 2d array sorted in increasing order from left to right and top to bottom, what is the best way to search for a target number?
The following was asked in a Google interview:
You are given a 2D array storing integers, sorted vertically and horizontally.
Write a method that takes as input an integer and outputs a bool saying whether or not the integer is in the array.
What is the best way to do this? And what is its time complexity?
Start at the Bottom-Left corner of the Matrix and follow the rules stated below to traverse the matrix:
The matrix traversal is based on these conditions:
If the input number is greater than current number: Move Right
If the input number is less than current number: Move Up.
If the input number is equal to current number: Return Success
If the input number is not equal to current number and no transition is possible: Return Fail
Time Complexity: (Thanks to Martinho Fernandes)
The time complexity is O(N+M). In the worst case, the element searched for is in the upper-left corner, meaning you'll go up N times, and left M times.
Example
Input matrix:
--------------
| 1 | 4 | 6 |
--------------
| 2 | 5 | 9 |
--------------
| *3* | 8 | 10 |
--------------
Number to search: 4
Step 1:
Start at the cell where you have 3 (Bottom-Left).
3 < 4: Move Right
| 1 | 4 | 6 |
--------------
| 2 | 5 | 9 |
--------------
| 3 | *8* | 10 |
--------------
Step 2:
8 > 4: Move Up
| 1 | 4 | 6 |
--------------
| 2 | *5* | 9 |
--------------
| 3 | 8 | 10 |
--------------
Step 3:
5 > 4: Move Up
| 1 | *4* | 6 |
--------------
| 2 | 5 | 9 |
--------------
| 3 | 8 | 10 |
--------------
Step 4:
4=4: Return the index of the number
I would start by asking details about what it means to be "sorted vertically and horizontally"
If the matrix is sorted in a way that the last element of each row is less than the first element of the next row, you can run a binary search on the first column to find out in what row that number is, and then run another binary search on the row. This algorithm will take O(log C + log R) time, where C and R are, respectively the number of rows and columns. Using a property of the logarithm, one can write that as O(log(C*R)), which is the same as O(log N), if N is the number of elements in the array. This is almost the same as treating the array as 1D and running a binary search on it.
But the matrix could be sorted in a way that the last element of each row is not less than the first element of the next row:
1 2 3 4 5 6 7 8 9
2 3 4 5 6 7 8 9 10
3 4 5 6 7 8 9 10 11
In this case, you could run some sort of horizontal an vertical binary search simultaneously:
Test the middle number of the first column. If it's less than the target, consider the lines above it. If it's greater, consider those below;
Test the middle number of the first considered line. If it's less, consider the columns left of it. If it's greater, consider those to the right;
Lathe, rinse, repeat until you find one, or you're left with no more elements to consider;
This method is also logarithmic on the number of elements.
The first method that comes to mind is a vertical binary search, followed by a horizontal one when you find the row it should be in. Complexity will be O(log NM) where N and M are the dimensions of the array.
Further explanation:
Consider just the first number of every row. When you perform a binary search of these first numbers for the specified number, the result will be either the specified number if you're lucky, otherwise it will be the position before or after where the specified number would go depending on the binary search implementation. Once you find the two of the first numbers that the specified number should go between, you know that the number is in that row, and a second binary search will find the number if it is in the row.

Resources