Number of Comparisons in Merge-Sort

Number of Comparisons in Merge-Sort - algorithm

I was studying the merge-sort subject that I ran into this concept that the number of comparisons in merge-sort (in the worst-case, and according to Wikipedia) equals (n ⌈lg n⌉ - 2⌈lg n⌉ + 1); in fact it's between (n lg n - n + 1) and (n lg n + n + O(lg n)). The problem is that I cannot figure out what these complexities try to say. I know O(nlogn) is the complexity of merge-sort but the number of comparisons?

Why to count comparisons
There are basically two operations to any sorting algorithm: comparing data and moving data. In many cases, comparing will be more expensive than moving. Think about long strings in a reference-based typing system: moving data will simply exchange pointers, but comparing might require iterating over a large common part of the strings before the first difference is found. So in this sense, comparison might well be the operation to focus on.
Why an exact count
The numbers appear to be more detailed: instead of simply giving some Landau symbol (big-Oh notation) for the complexity, you get an actual number. Once you have decided what a basic operation is, like a comparison in this case, this approach of actually counting operations becomes feasible. This is particularly important when comparing the constants hidden by the Landau symbol, or when examining the non-asymptotic case of small inputs.
Why this exact count formula
Note that throughout this discussion, lg denotes the logarithm with base 2. When you merge-sort n elements, you have ⌈lg n⌉ levels of merges. Assume you place ⌈lg n⌉ coins on each element to be sorted, and a merge costs one coin. This will certainly be enough to pay for all the merges, as each element will be included in ⌈lg n⌉ merges, and each merge won't take more comparisons than the number of elements involved. So this is the n⌈lg n⌉ from your formula.
As a merge of two arrays of length m and n takes only m + n − 1 comparisons, you still have coins left at the end, one from each merge. Let us for the moment assume that all our array lengths are powers of two, i.e. that you always have m = n. Then the total number of merges is n − 1 (sum of powers of two). Using the fact that n is a power of two, this can also be written as 2⌈lg n⌉ − 1, and subtracting that number of returned coins from the number of all coins yields n⌈lg n⌉ − 2⌈lg n⌉ + 1 as required.
If n is 1 less than a power of two, then there are ⌈lg n⌉ merges where one element less is involved. This includes a merge of two one-element lists which used to take one coin and which now disappears altogether. So the total cost reduces by ⌈lg n⌉, which is exactly the number of coins you'd have placed on the last element if n were a power of two. So you have to place fewer coins up front, but you get back the same number of coins. This is the reason why the formula has 2⌈lg n⌉ instead of n: the value remains the same unless you drop to a smaller power of two. The same argument holds if the difference between n and the next power of two is greater than 1.
On the whole, this results in the formula given in Wikipedia:
n ⌈lg n⌉ − 2⌈lg n⌉ + 1
Note: I'm pretty happy with the above proof. For those who like my formulation, feel free to distribute it, but don't forget to attribute it to me as the license requires.
Why this lower bound
To proove the lower bound formula, let's write ⌈lg n⌉ = lg n + d with 0 ≤ d < 1. Now the formula above can be written as
n (lg n + d) − 2lg n + d + 1 =
n lg n + nd − n2d + 1 =
n lg n − n(2d − d) + 1 ≥
n lg n − n + 1
where the inequality holds because 2d − d ≤ 1 for 0 ≤ d < 1
Why this upper bound
I must confess, I'm rather confused why anyone would name n lg n + n + O(lg n) as an upper bound. Even if you wanted to avoid the floor function, the computation above suggests something like n lg n − 0.9n + 1 as a much tighter upper bound for the exact formula. 2d − d has its minimum (ln(ln(2)) + 1)/ln(2) ≈ 0.914 for d = −ln(ln(2))/ln(2) ≈ 0.529.
I can only guess that the quoted formula occurs in some publication, either as a rather loose bound for this algorithm, or as the exact number of comparisons for some other algorithm which is compared against this one.
(Two different counts)
This issue has been resolved by the comment below; one formula was originally quoted incorrectly.
equals (n lg n - n + 1); in fact it's between (n lg n - n + 1) and (n lg n + n + O(lg n))
If the first part is true, the second is trivially true as well, but explicitely stating the upper bound seems kind of pointless. I haven't looked at the details myself, but these two statements appear strange when taken together like this. Either the first one really is true, in which case I'd omit the second one as it is only confusing, or the second one is true, in which case the first one is wrong and should be omitted.

Related

Why is a list sort nlogn

I have tried googling this but have been confused. As it is the very start of an online course and we have not been introduced to concepts such as merge sort.
We are given the pseudo code below and told it has nlogn operations.
MaxPairwiseProductBySorting(A[1 . . . n]):
Sort(A)
return A[n − 1] · A[n]
I understand why something like the below could have n^2 operations. But am totally lost at the former of where the nlogn comes from.
MaxPairwiseProductNaive(A[1 . . . n]): product ← 0
fori from1ton:
forj fromi+1ton:
product←max(product,A[i]·A[j])
return product

There are lots of ways to sort lists. Under certain conditions a list can be sorted as quickly as O(n), but generally it will take O(n log n). The exact analysis depends on the specific sort, but the gist is that most of these sorts work like this:
Break the problem of sorting the list into 2 smaller sorting problems
Repeat
... with some way of handling very small sorts.
The log(n) comes from repeatedly splitting the problem. The n comes from the fact that we have to sort all of the parts, which will total to n since we haven't gotten rid of anything.
It would help you to read up on a specific sort to understand this better. Mergesort & quicksort are two common sorts, and Wikipedia has good articles on both.

The assumption in this code is that you're dealing with a so called comparison-sort that orders elements of the array by comparing two at a time.
Now for sorting n elements that way, we can generate a decision-tree that considers all potential binary decisions until the tree is ordered starting from an arbitrary input-permutation. The minimum achievable height for that decision-tree would then be a lower bound for any comparison-sort algorithm.
E.g. for n = 3:
-------------1:2-----------------
<= >
--------2:3------- ------2:3-----
<= > <= >
{1,2,3} ----1:3---- ---1:3--- {3,2,1}
<= > <= >
{1,3,2} {3,1,2} {2,1,3} {2,3,1}
The decision-tree must obviously contain all possible permutations of n values as leafs. Otherwise there would exist input-permutations that our algorithm couldn't properly sort. This means the tree must have at least n! leafs. So we have
n! <= nleafs <= 2^h
where h is the height of our tree. Taking the logarithm of both sides, we obtain
n lg n >= lg 1 + lg 2 + ... + lg n = lg n! >= h
So h = Omega(n lg n). Since h is the length of the longest path in the decision-tree, it is also the lower bound on the number of comparisons in the worst case. So any comparison-sort is Omega(n lg n).

Does the asymptotic complexity of a multiplication algorithm only rely on the larger of the two operands?

I'm taking an algorithms class and I repeatedly have trouble when I'm asked to analyze the runtime of code when there is a line with multiplication or division. How can I find big-theta of multiplying an n digit number with an m digit number (where n>m)? Is it the same as multiplying two n digit numbers?
For example, right now I'm attempting to analyze the following line of code:
return n*count/100
where count is at most 100. Is the asymptotic complexity of this any different from n*n/100? or n*n/n?

You can always look up here Computational complexity of mathematical operations.
In your complexity of n*count/100 is O(length(n)) as 100 is a constant and length(count) is at most 3.
In general multiplication of two numbers n and m digits length, takes O(nm), the same time required for division. Here i assume we are talking about long division. There are many sophisticated algorithms which will beat this complexity.
To make things clearer i will provide an example. Suppose you have three numbers:
A - n digits length
B - m digits length
C - p digits length
Find complexity of the following formula:
A * B / C
Multiply first. Complexity of A * B it is O(nm) and as result we have number D, which is n+m digits length. Now consider D / C, here complexity is O((n+m)p), where overall complexity is sum of the two O(nm + (n+m)p) = O(m(n+p) + np).
Divide first. So, we divide B / C, complexity is O(mp) and we have m digits number E. Now we calculate A * E, here complexity is O(nm). Again overall complexity is O(mp + nm) = O(m(n+p)).
From the analysis you can see that it is beneficial to divide first. Of course in real life situation you would account for numerical stability as well.

From Modern Computer Arithmetic:
Assume the larger operand has size
m, and the smaller has size n ≤ m, and denote by M(m,n) the corresponding
multiplication cost.
When m is an exact multiple of n, say m = kn, a trivial strategy is to cut the
larger operand into k pieces, giving M(kn,n) = kM(n) + O(kn).
Suppose m ≥ n and n is large. To use an evaluation-interpolation scheme,
we need to evaluate the product at m + n points, whereas balanced k by k
multiplication needs 2k points. Taking k ≈ (m+n)/2, we see that M(m,n) ≤ M((m + n)/2)(1 + o(1)) as n → ∞. On the other hand, from the discussion
above, we have M(m,n) ≤ ⌈m/n⌉M(n)(1 + o(1)).

Divide Set into pairs of elements with minimum difference between these elements

I need an algorithm which can help me divide N-elements array into pairs, elements in each pair must have minimum difference

So, I assume that we may throw away at most 1 element (in case when N is odd). Let a1 ≤ a2 ≤ … ≤ aN be our set of numbers sorted in non-decreasing order. Let f(k) be the minimum possible sum of differences inside pairs formed by the first k numbers. (So minimum is taken over all partitions of the first k numbers into pairs.) As it is mentioned in comments for even N we just need to take elements one by one. Then
f(0) = f(1) = 0,
f(2·k) = f(2·k − 2) + (a2·k − a2·k − 1) for k ≥ 1 and
f(2·k + 1) = min { f(2·k), f(2·k − 1) + (a2·k + 1 − a2·k) } for k ≥ 1.
The last formula means that we can either throw away (2·k + 1)th element or throw away one element among the first (2·k − 1). On the remaining 2·k elements we apply known solution for even N.
This is a way to find numerical answer for the problem in O(N) time after sorting numbers, i. e. in general it takes O(N log N) time. After that if you need partition of all numbers into pairs just use backward step of dynamic programming. For even N just make pairs trivially. For odd N throw away aN if f(N) = f(N − 1) or take pair (aN, aN − 1) and decrease N by 2 otherwise.

Calculate big Theta bound for 2 recursive calls

T(m,n) = 2T(m/2,n)+n, assume T(m,n) is constant if either m<2 or n<2
So what I don't understand is, can this problem be solved using Master Theorem? If so how? If not, is this table correct?
level # of instances size cost of each level total cost
0 1 m, n n n
1 2 m/2, n n 2n
2 4 m/4, n n 4n
i 2^i m/(2^i), n n 2^i * n
k m 1, n n n*m
Thanks!

The Master Theorem might be a little bit of an overkill here and your solution method is not bad (log means logarithmus to base 2, c=T(1,n)):
T(m,n)=n+2T(m/2,n)=n+2n+4T(m/4,n)=n*(1+2+4+..+2^log(m))+2^log(m)*c
=n*(2^(log(m)+1)-1)+m*c=Theta(n*m)
If you use the Master Theorem by treating n as a constant, than you would easily get T(m,n)=Theta(m*C(n)) with a constant C depending on n, but the Master Theorem does not tell you much about this constant C. If you get too smart and inattentive you could easily get burned:
T(m,n)=n+2T(m/2,n)=n*(1+2/nT(m/2,n))=n*Theta(2^(log(m/n)))
=n*Theta(m/n)=Theta(m)
And now, because you left out C(n) in the third step, you got a wrong result!

Average Runtime of Quickselect

Wikipedia states that the average runtime of quickselect algorithm (Link) is O(n). However, I could not clearly understand how this is so. Could anyone explain to me (via recurrence relation + master method usage) as to how the average runtime is O(n)?

Because
we already know which partition our desired element lies in.
We do not need to sort (by doing partition on) all the elements, but only do operation on the partition we need.
As in quick sort, we have to do partition in halves *, and then in halves of a half, but this time, we only need to do the next round partition in one single partition (half) of the two where the element is expected to lie in.
It is like (not very accurate)
n + 1/2 n + 1/4 n + 1/8 n + ..... < 2 n
So it is O(n).
Half is used for convenience, the actual partition is not exact 50%.

To do an average case analysis of quickselect one has to consider how likely it is that two elements are compared during the algorithm for every pair of elements and assuming a random pivoting. From this we can derive the average number of comparisons. Unfortunately the analysis I will show requires some longer calculations but it is a clean average case analysis as opposed to the current answers.
Let's assume the field we want to select the k-th smallest element from is a random permutation of [1,...,n]. The pivot elements we choose during the course of the algorithm can also be seen as a given random permutation. During the algorithm we then always pick the next feasible pivot from this permutation therefore they are chosen uniform at random as every element has the same probability of occurring as the next feasible element in the random permutation.
There is one simple, yet very important, observation: We only compare two elements i and j (with i<j) if and only if one of them is chosen as first pivot element from the range [min(k,i), max(k,j)]. If another element from this range is chosen first then they will never be compared because we continue searching in a sub-field where at least one of the elements i,j is not contained in.
Because of the above observation and the fact that the pivots are chosen uniform at random the probability of a comparison between i and j is:
2/(max(k,j) - min(k,i) + 1)
(Two events out of max(k,j) - min(k,i) + 1 possibilities.)
We split the analysis in three parts:
max(k,j) = k, therefore i < j <= k
min(k,i) = k, therefore k <= i < j
min(k,i) = i and max(k,j) = j, therefore i < k < j
In the third case the less-equal signs are omitted because we already consider those cases in the first two cases.
Now let's get our hands a little dirty on calculations. We just sum up all the probabilities as this gives the expected number of comparisons.
Case 1
Case 2
Similar to case 1 so this remains as an exercise. ;)
Case 3
We use H_r for the r-th harmonic number which grows approximately like ln(r).
Conclusion
All three cases need a linear number of expected comparisons. This shows that quickselect indeed has an expected runtime in O(n). Note that - as already mentioned - the worst case is in O(n^2).
Note: The idea of this proof is not mine. I think that's roughly the standard average case analysis of quickselect.
If there are any errors please let me know.

In quickselect, as specified, we apply recursion on only one half of the partition.
Average Case Analysis:
First Step: T(n) = cn + T(n/2)
where, cn = time to perform partition, where c is any constant(doesn't matter). T(n/2) = applying recursion on one half of the partition.Since it's an average case we assume that the partition was the median.
As we keep on doing recursion, we get the following set of equation:
T(n/2) = cn/2 + T(n/4) T(n/4) = cn/2 + T(n/8) .. . T(2) = c.2 + T(1) T(1) = c.1 + ...
Summing the equations and cross-cancelling like values produces a linear result.
c(n + n/2 + n/4 + ... + 2 + 1) = c(2n) //sum of a GP
Hence, it's O(n)

I also felt very conflicted at first when I read that the average time complexity of quickselect is O(n) while we break the list in half each time (like binary search or quicksort). It turns out that breaking the search space in half each time doesn't guarantee an O(log n) or O(n log n) runtime. What makes quicksort O(n log n) and quickselect is O(n) is that we always need to explore all branches of the recursive tree for quicksort and only a single branch for quickselect. Let's compare the time complexity recurrence relations of quicksort and quickselect to prove my point.
Quicksort:
T(n) = n + 2T(n/2)
= n + 2(n/2 + 2T(n/4))
= n + 2(n/2) + 4T(n/4)
= n + 2(n/2) + 4(n/4) + ... + n(n/n)
= 2^0(n/2^0) + 2^1(n/2^1) + ... + 2^log2(n)(n/2^log2(n))
= n (log2(n) + 1) (since we are adding n to itself log2 + 1 times)
Quickselect:
T(n) = n + T(n/2)
= n + n/2 + T(n/4)
= n + n/2 + n/4 + ... n/n
= n(1 + 1/2 + 1/4 + ... + 1/2^log2(n))
= n (1/(1-(1/2))) = 2n (by geometric series)
I hope this convinces you why the average runtime of quickselect is O(n).

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio