Levels of recursion in merge sort - algorithm

How do I determine how many levels of recursion are necessary for merge sort to sort a list of size 8?
I am looking for only the level of recursive calls, not the return steps.
will it be 4?
Because if I have a list: {18,16,13,14,11,12,15,17} I can sort it in 4 levels using recursion
{18,16,13,14,11,12,15,17}
one initial call, then a level of recursion for each time you need to divide the list in half before you get down to single-element lists.
{18,16,13,14} {11,12,15,17}
{18,16} {13,14} {11,12} {15,17}
{18} {16} {13} {14} {11} {12} {15} {17}
or log2(n) levels plus 1 = log2(8) + 1 = 4

You must define the terms level of recursion precisely: it refers conventionally to the depth of recursion, but does the first call count? In your example, you show 3 levels of recursion, but the depth of the call stack is 4 including the initial call with the full array.
Furthermore, there would be an extra level if the array size was just one greater than 8, which your formula would fail to account for.
Here is a more accurate algorithm, counting the initial call:
if the length is 0 or 1, depth is 1
otherwise, depth is 2 + log2(length - 1)

Related

Replace two elements with their absolute difference and generate the minimum possible element in array

I have an array of size n and I can apply any number of operations(zero included) on it. In an operation, I can take any two elements and replace them with the absolute difference of the two elements. We have to find the minimum possible element that can be generated using the operation. (n<1000)
Here's an example of how operation works. Let the array be [1,3,4]. Applying operation on 1,3 gives [2,4] as the new array.
Ex: 2 6 11 3 => ans = 0
This is because 11-6 = 5 and 5-3 = 2 and 2-2 = 0
Ex: 20 6 4 => ans = 2
Ex: 2 6 10 14 => ans = 0
Ex: 2 6 10 => ans = 2
Can anyone tell me how can I approach this problem?
Edit:
We can use recursion to generate all possible cases and pick the minimum element from them. This would have complexity of O(n^2 !).
Another approach I tried is Sorting the array and then making a recursion call where the either starting from 0 or 1, I apply the operations on all consecutive elements. This will continue till their is only one element left in the array and we can return the minimum at any point in the recursion. This will have a complexity of O(n^2) but doesn't necessarily give the right answer.
Ex: 2 6 10 15 => (4 5) & (2 4 15) => (1) & (2 15) & (2 11) => (13) & (9). The minimum of this will be 1 which is the answer.
When you choose two elements for the operation, you subtract the smaller one from the bigger one. So if you choose 1 and 7, the result is 7 - 1 = 6.
Now having 2 6 and 8 you can do:
8 - 2 -> 6 and then 6 - 6 = 0
You may also write it like this: 8 - 2 - 6 = 0
Let"s consider different operation: you can take two elements and replace them by their sum or their difference.
Even though you can obtain completely different values using the new operation, the absolute value of the element closest to 0 will be exactly the same as using the old one.
First, let's try to solve this problem using the new operations, then we'll make sure that the answer is indeed the same as using the old ones.
What you are trying to do is to choose two nonintersecting subsets of initial array, then from sum of all the elements from the first set subtract sum of all the elements from the second one. You want to find two such subsets that the result is closest possible to 0. That is an NP problem and one can efficiently solve it using pseudopolynomial algorithm similar to the knapsack problem in O(n * sum of all elements)
Each element of initial array can either belong to the positive set (set which sum you subtract from), negative set (set which sum you subtract) or none of them. In different words: each element you can either add to the result, subtract from the result or leave untouched. Let's say we already calculated all obtainable values using elements from the first one to the i-th one. Now we consider i+1-th element. We can take any of the obtainable values and increase it or decrease it by the value of i+1-th element. After doing that with all the elements we get all possible values obtainable from that array. Then we choose one which is closest to 0.
Now the harder part, why is it always a correct answer?
Let's consider positive and negative sets from which we obtain minimal result. We want to achieve it using initial operations. Let's say that there are more elements in the negative set than in the positive set (otherwise swap them).
What if we have only one element in the positive set and only one element in the negative set? Then absolute value of their difference is equal to the value obtained by using our operation on it.
What if we have one element in the positive set and two in the negative one?
1) One of the negative elements is smaller than the positive element - then we just take them and use the operation on them. The result of it is a new element in the positive set. Then we have the previous case.
2) Both negative elements are smaller than the positive one. Then if we remove bigger element from the negative set we get the result closer to 0, so this case is impossible to happen.
Let's say we have n elements in the positive set and m elements in the negative set (n <= m) and we are able to obtain the absolute value of difference of their sums (let's call it x) by using some operations. Now let's add an element to the negative set. If the difference before adding new element was negative, decreasing it by any other number makes it smaller, that is farther from 0, so it is impossible. So the difference must have been positive. Then we can use our operation on x and the new element to get the result.
Now second case: let's say we have n elements in the positive set and m elements in the negative set (n < m) and we are able to obtain the absolute value of difference of their sums (again let's call it x) by using some operations. Now we add new element to the positive set. Similarly, the difference must have been negative, so x is in the negative set. Then we obtain the result by doing the operation on x and the new element.
Using induction we can prove that the answer is always correct.

What is the number of levels in merge sort?

I am confused what is the number of levels (height of the tree) in merge sort.
Somewhere I have seen it is given ceiling function of the $\log{2}{n}$ but where it is written it is $\log{2}{n} + 1$
Can anyone explain what is the correct way.
The ceiling function is needed to handle the case when n is not a power of 2.
The answer in
How come the height of recursion tree in merge sort lg(n)+1
is misleading because each node in a binary tree consists of a value and two links (either of which may be null), while each stack frame in a recursive top down merge sort consists of 0 values and two indexes, iterators, or pointers.
As for the formula, if there is a check for the base case of sub-array size == 1 made by the caller, including the initial call, such as
if(end-begin > 1) call mergesort();
then the formula is number of stack frame levels = ceiling(log2(n)).
If the check for the base case is done only at the top of mergesort, then the formula is number of stack frame levels = 2 + ceiling(log2(n)). Consider the case of n = 7, there are 5 levels of stack frames, if you include the initial call to mergesort. With top down merge sort, recursion continues until sub-array size is reduced to the base case of one element, in which case it returns:
level
1 0,6
2 0,3 3,6
3 0,1 1,3 3,4 4,6
4 ret 1,2 2,3 ret 4,5 5,6
5 ret ret ret ret

What happens if we iterates build-max- heap in Top Down Manner

what are the disadvantages if we construct build heap in top down manner with brief time complexity calculation.in brief using first buid-max-heap heap algorithm than commonly used second algorithm
Build-max-heap(A)
{
A.heap-size=A.length
for(i=1 to [A.lenth]/2)
max-heapify(A,i)
}
Build-max-heap(A)
{
A.heap-size=A.length
for(i=[A.lenth]/2 downto 1)
max-heapify(A,i)
}
As written, your first example won't do anything because i is less than [A.length/2]. I suspect you meant your first example to be:
for (i=1 to [A.length]/2)
Assuming that's what you meant, doing the min-heapify from the top, down will not result in a valid heap. Consider the original array [4,3,2,1], which represents this tree:
4
3 2
1
On the first iteration, you want to move 4, down. So you swap it with the smallest child and get the array [2,3,4,1].
Next, you want to filter 3. So you swap it with its smallest child and get [2,1,4,3]. You're done now, and your "heap" looks like this:
2
1 4
3
Which is not a valid heap.
When you go from the middle, up, then the smallest item can filter its way to the top. But when you go from the top down, it's possible for the smallest item never to reach the top.
a max or min heap is an implementation of a nested max or min function,
e.g. max(max(max(a, b), max(c, d)), ...), it is a kind of an expression tree for min() or max() of all array elements, that is, you are implementing max(a, b, c, ...) or min(a, b, c, ...). To yield the correct result you need to gather the min or max elements to compare. To do that you need to do a broad comparison of the bottom elements, then going up, the number of elements you need to compare is divided by 2 per level (one half are eliminated per level). Going from top to bottom will not yield the correct result; you are implementing the wrong expression.

minimum switch to sorted permutation

Suppose I have an array like this:
[5 4 1 2 3]
And I want to compute the minimum switch I have to make to sort the unsorted permutation.
Now the answer is 7 in this case. Just move 4 and 5 to the right, or move 1, 2, 3 to the left.
The irony though, is that I used [4 5 1 2 3] in my notes, which gives 6, and mislead myself and make a fool of myself.
Steps:
[5 1 4 2 3] // step 1
[1 5 4 2 3] // step 2
[1 5 2 4 3] // step 3
[1 2 5 4 3] // step 4
[1 2 5 3 4] // step 5
[1 2 3 5 4] // step 6
[1 2 3 4 5] // step 7
I've thought of things like having an array that keep the offset needed, and for each loop, just look for the switch that moves the whole thing closer to goal.
But that just seem too slow, any ideas?
EDIT:
from comment: are the members of the array guaranteed to completely belong to {1..N} set for an array of size N, without repeating numbers?
Nope. It's not guaranteed not to repeat or being in [1...n] for array sized N.
UPDATE:
There are two solutions to this particular problem, once is slower but more straightforward bubblesort, another is the faster but less straightforward mergesort.
With bubblesort, you basically count the number of switches when running the algorithm.
With mergesort, it's a bit more trickier, but the counting happens when merging. When the array is already merged, the count should yield 0 as no switches will be needed to sort this array. With bubblesort, you count the switches when you push the largest or the smallest number to the left or right. With mergesort, you count switches when merging. I bit of hand writing brute forcing will get you there.
What you're actually looking for is calculating the number of inversions in a sequence.
This can be done in O(n*logn) using mergesort, for example.
Here you have an article about this subject, looks quite understandable.
Some more links:
https://stackoverflow.com/a/338252/2180475
https://codereview.stackexchange.com/questions/12922/inversion-count-using-merge-sort
This looks suspiciously similar to bubble sort, in which you need up to n^2 movements.
And the interesting fact is that, simple bubble sort actually achieves your goal to find the minimum number of switches! (proof below)
In that case, we don't need to further improve algorithms using double loops, and it's actually possible using double loops (in C++):
int switch = 0;
for(int repeat=0; repeat<n; repeat++){
for(int j=0; j<n-repeat; j++){
if(arr[j]>arr[j+1]){
int tmp = arr[j];
arr[j] = arr[j+1];
arr[j+1] = tmp;
switch = switch + 1
}
}
}
The switch is the result.
arr is the array containing the numbers.
n is the length of the array.
Prove that this produces minimum number of switch:
First, we note that the bubble sort essentially moves the highest element into the rightmost position in the array at each iteration (outer loop)
Note that switching the highest element with any other element in the process does not change the relative order of other elements. And also any other switch operations done in between our attempt to move the highest element to its position will not change the number of switch required to move the highest element to place. And so we can interchange the switch operations such that the highest element is always switched first until it gets into position. Therefore switching the highest element into its position one at a time is optimum.

In what order should you insert a set of known keys into a B-Tree to get minimal height?

Given a fixed number of keys or values(stored either in array or in some data structure) and order of b-tree, can we determine the sequence of inserting keys that would generate a space efficient b-tree.
To illustrate, consider b-tree of order 3. Let the keys be {1,2,3,4,5,6,7}. Inserting elements into tree in the following order
for(int i=1 ;i<8; ++i)
{
tree.push(i);
}
would create a tree like this
4
2 6
1 3 5 7
see http://en.wikipedia.org/wiki/B-tree
But inserting elements in this way
flag = true;
for(int i=1,j=7; i<8; ++i,--j)
{
if(flag)
{
tree.push(i);
flag = false;
}
else
{
tree.push(j);
flag = true;
}
}
creates a tree like this
3 5
1 2 4 6 7
where we can see there is decrease in level.
So is there a particular way to determine sequence of insertion which would reduce space consumption?
The following trick should work for most ordered search trees, assuming the data to insert are the integers 1..n.
Consider the binary representation of your integer keys - for 1..7 (with dots for zeros) that's...
Bit : 210
1 : ..1
2 : .1.
3 : .11
4 : 1..
5 : 1.1
6 : 11.
7 : 111
Bit 2 changes least often, Bit 0 changes most often. That's the opposite of what we want, so what if we reverse the order of those bits, then sort our keys in order of this bit-reversed value...
Bit : 210 Rev
4 : 1.. -> ..1 : 1
------------------
2 : .1. -> .1. : 2
6 : 11. -> .11 : 3
------------------
1 : ..1 -> 1.. : 4
5 : 1.1 -> 1.1 : 5
3 : .11 -> 11. : 6
7 : 111 -> 111 : 7
It's easiest to explain this in terms of an unbalanced binary search tree, growing by adding leaves. The first item is dead centre - it's exactly the item we want for the root. Then we add the keys for the next layer down. Finally, we add the leaf layer. At every step, the tree is as balanced as it can be, so even if you happen to be building an AVL or red-black balanced tree, the rebalancing logic should never be invoked.
[EDIT I just realised you don't need to sort the data based on those bit-reversed values in order to access the keys in that order. The trick to that is to notice that bit-reversing is its own inverse. As well as mapping keys to positions, it maps positions to keys. So if you loop through from 1..n, you can use this bit-reversed value to decide which item to insert next - for the first insert use the 4th item, for the second insert use the second item and so on. One complication - you have to round n upwards to one less than a power of two (7 is OK, but use 15 instead of 8) and you have to bounds-check the bit-reversed values. The reason is that bit-reversing can move some in-bounds positions out-of-bounds and visa versa.]
Actually, for a red-black tree some rebalancing logic will be invoked, but it should just be re-colouring nodes - not rearranging them. However, I haven't double checked, so don't rely on this claim.
For a B tree, the height of the tree grows by adding a new root. Proving this works is, therefore, a little awkward (and it may require a more careful node-splitting than a B tree normally requires) but the basic idea is the same. Although rebalancing occurs, it occurs in a balanced way because of the order of inserts.
This can be generalised for any set of known-in-advance keys because, once the keys are sorted, you can assign suitable indexes based on that sorted order.
WARNING - This isn't an efficient way to construct a perfectly balanced tree from known already-sorted data.
If you have your data already sorted, and know it's size, you can build a perfectly balanced tree in O(n) time. Here's some pseudocode...
if size is zero, return null
from the size, decide which index should be the (subtree) root
recurse for the left subtree, giving that index as the size (assuming 0 is a valid index)
take the next item to build the (subtree) root
recurse for the right subtree, giving (size - (index + 1)) as the size
add the left and right subtree results as the child pointers
return the new (subtree) root
Basically, this decides the structure of the tree based on the size and traverses that structure, building the actual nodes along the way. It shouldn't be too hard to adapt it for B Trees.
This is how I would add elements to b-tree.
Thanks to Steve314, for giving me the start with binary representation,
Given are n elements to add, in order. We have to add it to m-order b-tree. Take their indexes (1...n) and convert it to radix m. The main idea of this insertion is to insert number with highest m-radix bit currently and keep it above the lesser m-radix numbers added in the tree despite splitting of nodes.
1,2,3.. are indexes so you actually insert the numbers they point to.
For example, order-4 tree
4 8 12 highest radix bit numbers
1,2,3 5,6,7 9,10,11 13,14,15
Now depending on order median can be:
order is even -> number of keys are odd -> median is middle (mid median)
order is odd -> number of keys are even -> left median or right median
The choice of median (left/right) to be promoted will decide the order in which I should insert elements. This has to be fixed for the b-tree.
I add elements to trees in buckets. First I add bucket elements then on completion next bucket in order. Buckets can be easily created if median is known, bucket size is order m.
I take left median for promotion. Choosing bucket for insertion.
| 4 | 8 | 12 |
1,2,|3 5,6,|7 9,10,|11 13,14,|15
3 2 1 Order to insert buckets.
For left-median choice I insert buckets to the tree starting from right side, for right median choice I insert buckets from left side. Choosing left-median we insert median first, then elements to left of it first then rest of the numbers in the bucket.
Example
Bucket median first
12,
Add elements to left
11,12,
Then after all elements inserted it looks like,
| 12 |
|11 13,14,|
Then I choose the bucket left to it. And repeat the same process.
Median
12
8,11 13,14,
Add elements to left first
12
7,8,11 13,14,
Adding rest
8 | 12
7 9,10,|11 13,14,
Similarly keep adding all the numbers,
4 | 8 | 12
3 5,6,|7 9,10,|11 13,14,
At the end add numbers left out from buckets.
| 4 | 8 | 12 |
1,2,|3 5,6,|7 9,10,|11 13,14,|15
For mid-median (even order b-trees) you simply insert the median and then all the numbers in the bucket.
For right-median I add buckets from the left. For elements within the bucket I first insert median then right elements and then left elements.
Here we are adding the highest m-radix numbers, and in the process I added numbers with immediate lesser m-radix bit, making sure the highest m-radix numbers stay at top. Here I have only two levels, for more levels I repeat the same process in descending order of radix bits.
Last case is when remaining elements are of same radix-bit and there is no numbers with lesser radix-bit, then simply insert them and finish the procedure.
I would give an example for 3 levels, but it is too long to show. So please try with other parameters and tell if it works.
Unfortunately, all trees exhibit their worst case scenario running times, and require rigid balancing techniques when data is entered in increasing order like that. Binary trees quickly turn into linked lists, etc.
For typical B-Tree use cases (databases, filesystems, etc), you can typically count on your data naturally being more distributed, producing a tree more like your second example.
Though if it is really a concern, you could hash each key, guaranteeing a wider distribution of values.
for( i=1; i<8; ++i )
tree.push(hash(i));
To build a particular B-tree using Insert() as a black box, work backward. Given a nonempty B-tree, find a node with more than the minimum number of children that's as close to the leaves as possible. The root is considered to have minimum 0, so a node with the minimum number of children always exists. Delete a value from this node to be prepended to the list of Insert() calls. Work toward the leaves, merging subtrees.
For example, given the 2-3 tree
8
4 c
2 6 a e
1 3 5 7 9 b d f,
we choose 8 and do merges to obtain the predecessor
4 c
2 6 a e
1 3 5 79 b d f.
Then we choose 9.
4 c
2 6 a e
1 3 5 7 b d f
Then a.
4 c
2 6 e
1 3 5 7b d f
Then b.
4 c
2 6 e
1 3 5 7 d f
Then c.
4
2 6 e
1 3 5 7d f
Et cetera.
So is there a particular way to determine sequence of insertion which would reduce space consumption?
Edit note: since the question was quite interesting, I try to improve my answer with a bit of Haskell.
Let k be the Knuth order of the B-Tree and list a list of keys
The minimization of space consumption has a trivial solution:
-- won't use point free notation to ease haskell newbies
trivial k list = concat $ reverse $ chunksOf (k-1) $ sort list
Such algorithm will efficiently produce a time-inefficient B-Tree, unbalanced on the left but with minimal space consumption.
A lot of non trivial solutions exist that are less efficient to produce but show better lookup performance (lower height/depth). As you know, it's all about trade-offs!
A simple algorithm that minimizes both the B-Tree depth and the space consumption (but it doesn't minimize lookup performance!), is the following
-- Sort the list in increasing order and call sortByBTreeSpaceConsumption
-- with the result
smart k list = sortByBTreeSpaceConsumption k $ sort list
-- Sort list so that inserting in a B-Tree with Knuth order = k
-- will produce a B-Tree with minimal space consumption minimal depth
-- (but not best performance)
sortByBTreeSpaceConsumption :: Ord a => Int -> [a] -> [a]
sortByBTreeSpaceConsumption _ [] = []
sortByBTreeSpaceConsumption k list
| k - 1 >= numOfItems = list -- this will be a leaf
| otherwise = heads ++ tails ++ sortByBTreeSpaceConsumption k remainder
where requiredLayers = minNumberOfLayersToArrange k list
numOfItems = length list
capacityOfInnerLayers = capacityOfBTree k $ requiredLayers - 1
blockSize = capacityOfInnerLayers + 1
blocks = chunksOf blockSize balanced
heads = map last blocks
tails = concat $ map (sortByBTreeSpaceConsumption k . init) blocks
balanced = take (numOfItems - (mod numOfItems blockSize)) list
remainder = drop (numOfItems - (mod numOfItems blockSize)) list
-- Capacity of a layer n in a B-Tree with Knuth order = k
layerCapacity k 0 = k - 1
layerCapacity k n = k * layerCapacity k (n - 1)
-- Infinite list of capacities of layers in a B-Tree with Knuth order = k
capacitiesOfLayers k = map (layerCapacity k) [0..]
-- Capacity of a B-Tree with Knut order = k and l layers
capacityOfBTree k l = sum $ take l $ capacitiesOfLayers k
-- Infinite list of capacities of B-Trees with Knuth order = k
-- as the number of layers increases
capacitiesOfBTree k = map (capacityOfBTree k) [1..]
-- compute the minimum number of layers in a B-Tree of Knuth order k
-- required to store the items in list
minNumberOfLayersToArrange k list = 1 + f k
where numOfItems = length list
f = length . takeWhile (< numOfItems) . capacitiesOfBTree
With this smart function given a list = [21, 18, 16, 9, 12, 7, 6, 5, 1, 2] and a B-Tree with knuth order = 3 we should obtain [18, 5, 9, 1, 2, 6, 7, 12, 16, 21] with a resulting B-Tree like
[18, 21]
/
[5 , 9]
/ | \
[1,2] [6,7] [12, 16]
Obviously this is suboptimal from a performance point of view, but should be acceptable, since obtaining a better one (like the following) would be far more expensive (computationally and economically):
[7 , 16]
/ | \
[5,6] [9,12] [18, 21]
/
[1,2]
If you want to run it, compile the previous code in a Main.hs file and compile it with ghc after prepending
import Data.List (sort)
import Data.List.Split
import System.Environment (getArgs)
main = do
args <- getArgs
let knuthOrder = read $ head args
let keys = (map read $ tail args) :: [Int]
putStr "smart: "
putStrLn $ show $ smart knuthOrder keys
putStr "trivial: "
putStrLn $ show $ trivial knuthOrder keys

Resources